Solved

CDF Data model

  • 20 June 2023
  • 3 replies
  • 112 views

Badge +2

We have a unique use case where there are data points collected from PI server for refinery plant and alongside, we have other refinery data like Crude assays, Diet, Mass Balance. All these data points (in tabular format) are collected and then some formula is applied across to get properties (like swing cut %, CBLISS, Vol%, etc.) and some derived tables are created. Then some input feed is made from the actual data sent to a Petro-Sim tool (math simulation tool/modeling) and then output from that tool is collected and stored. All these data wrangling is done and finally charts are finally made for some 50+ crude product variants to measure yield tracking (Actual data, non-linear model data, linear model data). As per design, we are planning to use the CDF raw tables and then do all the computations and create derived raw tables to fulfil the purpose. Then use Petro-Sim connector and then store the data output from the model tool as well in raw tables. 

So, we don't tend to see the typical CDF style data construct like Assets, Sequences, work-order, maintenance-data, files, events, labels 3D diagrams etc. So, when we see the data-model, it is very unique. 

 

Can someone suggest if our approach to implement in CDF is right and if our interpretation is wrong?

Please share pointers and also on some guides on how to design and architect a solution model.

icon

Best answer by Everton Colling 20 June 2023, 14:56

View original

3 replies

Userlevel 4

Hi Eashwar! The use case you are mentioning is an extremely valuable one and it’s definitely solvable using Cognite Data Fusion.

I see that you are planning to rely on RAW tables, but I would not recommend it for this case. RAW is intended as a staging area, to store data dumped from extractors until it’s transformed into clean data (data stored into high performant CDF resources). I suggest creating a well defined data model for the source data and another one for the “calculated data” to store the multiple attributes and their dependencies. Any time indexed data should be stored into timeseries (that are referenced in the data model).

To populate the datamodels with the source data you can use Transformations or Cognite Functions. To populate the calculations (formula that generates the new “tables”) data model, I suggest using Cognite Functions to include error handling/logging/data quality checks.

When it comes to the CDF Petro-SIM connector, it’s an early adopter capability that is enabled for selected customers. If you are interested we can schedule a meeting to discuss enrolling into the early adopter program.

Badge +2

Thanks @Everton Colling for your response. I will explore and see for early adopter program. Meanwhile, I wanted to get my data model construct for this case clean so that I can approach to develop the solution. Are there some sample data models available so that I can have a look at it for quick reference?

I can understand your point about not sticking to raw tables but to use clean CDF resource types. For instance, where do I store the Crude Assays input data in CDF, Diet Data (Crude composition data (for various crudes and its index values), mass-balance data etc. I am not sure of the final destination schema to store these data values in CDF targets. Should I be using CDF resource type - sequences here?

Please advise. 

Userlevel 4

Hi Eashwar! Data models are quite flexible, anything that you represent in RAW can be represented in a data model as well. With data models, you also has the capability to create custom types, associate relationships, dependencies that can simplify the data consumption.

We are working on a library of sample data models, to help our customers to get started, but that’s not available today. To help you get started, here’s a very simple example of a crude diet data model that contains time based entries for the crude feed of distillation units. You can easily extend it by adding more properties and relationships according to the data available and the use case requirements.

type CrudeAssay {
name: String!
id: Int!
volume: Float
volumeFraction: Float
mass: Float
massFraction: Float
}

type CrudeDiet {
timestamp: Timestamp
refinery: String
distillationUnit: String
diet: [CrudeAssay]
}

For more information about data models please check the documentation: https://docs.cognite.com/cdf/data_modeling/guides/create_dm/

Reply