Impact 2024: The Industrial Data and AI Conference for and by Users | Nominate Speakers Now for a Ch...
Hey @vaibhavsancheti25 , If you are referring to a PI extractor, it is specifically designed for continuous running and streaming of data.Ref from doc:The Cognite PI extractor connects to the OSISoft PI Data Archive and detects and streams time series into Cognite Data Fusion (CDF) in near real-time. In parallel, the extractor ingests historical data (backfill) to make all time series available in CDF. The PI points in the PI Data Archive correspond to the time series in CDF.
Hey @eashwar11 , the file extractor allows you to upload files to CDF, and then you can load each file with pandas and upload each page as a RAW table. It’s also possible to write your own service, which will take the files directly and upload them as a RAW. For example, a snippet to upload data from each sheet to a new RAW table:def upload_xls_file(client, file_path, db_name='Test'): """ Uploads an XLS file as a Pandas dataframe, with each sheet as a new dataframe. Uploads it into RAW table. Args: client (CogniteCLient): a Cognite client instance file_path (str): The path to the XLS file, db_name (str): the name of the CDF RAW database Returns: A dictionary where each key is the sheet name and each value is a Pandas dataframe. """ xls_file = pd.ExcelFile(file_path) sheet_names = xls_file.sheet_names dataframes = {} for sheet_name in sheet_names: client.raw.tables.create(db_name, sheet_name) df = xls_file.parse(sh
Hi @Ann Sullivan Thomas , If you have any trouble with the hands-on task at the end of the course, don't worry! We've got you covered. You can find a pre-made solution file in our GitHub repository to help guide you through it. Just follow the instructions provided in the course and refer to the solution file if needed.I hope this is helpful. If you have any questions, don't hesitate to ask.
Hey @thomafred , According to the documentation :If the uploadUrl contains the string '/v1/files/gcs_proxy/', you can make a Google Cloud Storage (GCS) resumable upload request as documented in https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload. And following the link there is an instruction about uploading files in chunks. I hope it helps you. Feel free to let me know if you have any additional questions.
@thomafred could you double-check that? If your IdP is Azure, that doesn’t mean yet that CDF resources are also hosted on Azure. But if you are, then I suppose you can leverage the documentation for Azure blob storage.
@thomafred then, most probably, you need to create a blob with Put Blob as described in the documentation and then use Put Block to upload chunks and Put Block List to commit the chunks after they are all uploaded. But it should be proven experimentally, I haven’t dealt with that particular case.
Hey @Sonali Vishal Patil we have a query endpoint as described in the docs. But be aware that it has a maximum timeout of 240 seconds, the same as the preview in the UI.
Hi @adeel.ali Unfortunately, the DB extractor right now only allows to use schedule.
Hey @eashwar11 , it seems that you just need to create an asset hierarchy using SDK or CDF Transformations and assign your time-series to the assets. Then you can use Python SDK method .subtree() to retrieve all the time-series from hierarchy or assets.retrieve() to get a particular asset (tag). Please inform me if you have any further inquiries.
@eashwar11 So as far as I understand you have a lot of time-series and you want to use only a few of them in a hierarchical structure? That’s the perfect setup for using ‘Assets’ as I see that. You need to create a few additional objects, an asset per tag, but it will allow you to represent the hierarchical structure perfectly. You can also use Data Modeling capabilities, but it will probably be overkill for that task.
Hello Thomas, my suggestion for resolving this issue would be to implement a retry strategy with back-off for uploading blocks. I have noticed that there is no request ID in the error logs, which makes it difficult to investigate further on our end.
Hello @Nonstad, can you provide more details? What actions did you take, what were your expectations, and what were the results? Please note that the "level" feature in the console section is only applicable to logs displayed in the console, not those saved in a file. As far as I see the debug level provides additional heartbeat logs every 10 minutes. Also, be aware that you need to restart the extractor after modifyinig the configurastion.
Hey @eashwar11 , please update the version of SDK installed locally.
That’s true, I see it now. I will try to reach out to someone from the storage team on our side. What is the size of the file to be uploaded, by the way?
@eashwar11 it depends on your setup. We usually use poetry for our local projects, so if I need to update something, I update the version in the toml file.
@thomafred could you please create a support ticket about that problem?
Hey @eashwar11 , I’m not sure if the sequence data type fits your needs. Do you have any reasons for using that particular type? I would recommend using time series. instead. You can easily access the values as tabular data (pandas DataFrame).client.time_series.data.retrieve_dataframe(list_of_external_ids, start='1m-ago', end='now', aggregates=['average'], granularity='1d')But if you don’t need to visualise the data, you can do the same directly from the RAW.client.raw.rows.retrieve_dataframe(db_name, table_name, limit=-1)
Hey @Mohit Shakaly . I'm sorry, but extractor-utils doesn't have that feature. However, you can choose to run it continuously and set the schedule within your code. Alternatively, you can terminate the extractor after each upload and set the schedule externally using a tool like cron.
Hey @eashwar11 . Can you provide more details about attaching values to time-series? Do you want simply add a datapoint? According to the documentation, you can insert datapoints or insert a pandas dataframe as datapoints in multiple time-series simultaneously. Let me know if that’s helpful.
@eashwar11 you can only archive datasets, there is no such possibility to remove them. This is due to the fact that data access rights can be determined by the dataset, and when it is deleted, a situation may arise when it is impossible to delete the data.
Hey @Viswanadha Sai Akhil Pujyam , it seems like an error related to the timestamp parsing. I observe that the date format in Basic Start Date and Basic Finish Date columns in the provided table is different from the format declared in the to_timestamp() function used.
Hey @Patrick Galler , you can use Jupyter Notebook as a workaround, which is built into each CDF project. from cognite.client import CogniteClientclient = CogniteClient()asset_xid = "PUT YOUR DATA HERE"tss = client.time_series.list(asset_external_ids=[asset_xid])tss.to_pandas().to_csv(f"Timeseries for {asset_xid}.csv")Put the external id of your asset (you can also define the list of assets), then run this code snippet and then download the result CSV file.
Already have an account? Login
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.
Sorry, we're still checking this file's contents to make sure it's safe to download. Please try again in a few minutes.
Sorry, our virus scanner detected that this file isn't safe to download.