Solved

Derived data entities

  • 12 July 2023
  • 4 replies
  • 64 views

Badge +2

I have to do complex calculations and store the resulting data in the form of data frames (tabular form of data structure). The only way I see is to use the ‘sequences’ in CDF resource types. 

But I think CDF sequences doesn't allow to do data wrangling as we can do in pandas data frames. So, I wish to know if there is any best way to accomplish the storage of tabular data structures like data frames / arrays like what we can usually do in core Python. Basically, I wish to store data in structures like we typically have in core Python. 

Lists, Dataframes , arrays etc. 

 

Any structure available in CDF?

icon

Best answer by HaydenH 19 July 2023, 10:57

View original

4 replies

Userlevel 3

Hi, I’m a Product Manager for our High Volume Storage services, including Sequences.  

I’m currently gathering information / requirements for a new service we’re planning to develop, something that will support tabular data and provide the means to do some level of statistical analysis on the data within the API.  Whilst I can’t recommend a service that will fulfil your needs out of the box (though we may have Solutions Architects here that have implemented things on top of Sequences that can help you), your requirements may be very useful in shaping what we’re planning to develop next.

Can you tell me more about the business context that you’re solving for here?  What is the use case?
Also can you tell me more about the shape of the data?  Number of ‘columns’ and ‘rows’?  If the data is Time Series or some other series, what is the frequency?   How do you envisage sorting and filtering to work?

Hi @eashwar11, I think the best way for you to approach your problem is to make use of the to_pandas() method from sequences (see here). I have not used sequences so much myself, but the core data models (sequences, time series, events and such) all support a to_pandas() method. So the call to your sequence might look like this:

client = CogniteClient()
client.sequences.data.retrieve(
external_id=ext_id,
start=0,
end=None
).to_pandas()

Then you have access to all your pandas functionality. After you have finished modifying/processing your DF, then you can also make use of the functionality for uploading dataframes to a sequence (here). I hope that helps.

Badge +2

Hi @Glen Sykes 

I am doing a use case called “Yield tracking”. Basically, I get lot of inputs and calculate the mass-balancing, the assays normalization, Diet feed, and then doing a dot product of these last two data elements and then use the output to compute swing% (SW%) values based on the type of the crude output (like liquids, gases products) and use those SW% slabs to compute the derived yields for some 500-line items in an assay matrix.

 

These processes have to be done on a daily basis based on the data feed and then stored appropriately with outputs in CDF so that cognite charts can be plotted using them. I have around 137 charts and most of them have three items in it (like Linear, Non-linear and Actual). I have done all these steps as a POC to master this flow in a local jupyter notebook using flat files (csv) and data frames in Python. I also plotted them using matplotlib.

Now, I have to translate all of them in cognite using CDF functions, sequences, timeseries and then use to build the charts. 

Badge +2

Hi @HaydenH 

Thanks for the response. I will try this option. Currently, I have decided to use the raw tables in CDF to store all of them instead of sequences and then use raw tables to fetch the dataset and to my processing. 

Reply