Solved

Nesting of Synthetic Time Series - Combining More than 100 Time Series

  • 2 March 2022
  • 4 replies
  • 61 views

Userlevel 3

Are Synthetic Time Series able to reference other Synthetic Time Series in their formulas specified?  

 

This is the problem I am trying to solve:

  • Synthetic Time Series is limited to referencing 100 time series
  • Sometimes a unit/facility/system/functional location/site may have more than 100 direct child assets
  • I need to calculate a “Synthetic Time Series” that is the daily average of a time series found on all child assets 

My thought is that, for 300 assets, I would have:

  • STS A: Sum for assets 1-100
  • STS B: Sum for assets 101-200
  • STS C: Sum for assets 201-300
  • STS D: Sum of STS A, B, and C divided by 300

Is there a better way to accomplish something like this in CDF?

icon

Best answer by Torgrim Aas 4 March 2022, 00:44

View original

4 replies

Hi Ben

Another way than using synthetic time series is to do a Cognite Function that iterates over all the assets in question. This would allow the average to also adjust for change in asset count if that is of relevance. E.g.
 

for each asset under a given node:
average += asset.timeseriesX.latestDP

return average/count

Something to consider though is that you most often cannot just take a given datapoint as is into an average across different data sources/instruments. Verifying both the data quality and timestamp is important to ensure you are not mixing valid and invalid values to avoid bad output.

Regards,

Torgrim

Userlevel 3

Hi @Torgrim Aas,

Thank you for the suggestion.  The time series we will need to aggregate is data that we are producing ourselves and known to have a consistent start and end date for all children and be at daily resolution.  Usually the range is between 5 and 40 years.  The average produced would be a new time series with the same start and end as all child data point.

Simple example with 3 ts and 4 days:

 

If we have 40 years of daily data for 500 time series, this would be ~7,305,000 data points.  Are you suggesting that we should be able to load all data point in memory in a Cognite Function, perform the average for each day and then return the result pretty quickly (< 1sec)?

 

With the maximum memory allocation of 2.5 for a Cognite Function, what would the upper limit of how many data points would fit into memory?  We can always chunk it into 5 year increments and make multiple calls if needed.

 

Thank you,

Ben

Hi Ben,

How often will this be done? If it is a daily average, then it is only one datapoint per time series you are loading, right? Or is this considered a post processing of a simulation? If the latter, then it might be worthwhile to look at if the average should be done on the fly while ingesting the data?

Also, you would not necessarily load all data into memory before doing the calculation. You would have to find a compromise between time due to lots of requests, or memory due to large requests. Another opportunity to consider is to use SparkSQL transformations. It is more efficient for large data sets.

Regards,
Torgrim

Userlevel 3

Hi @Torgrim Aas,

This query will be run on demand from a facility/unit dashboard.  It will run several times a day for each facility or unit. Yes, the output of the calculation/aggregation is a single time series with one data point per day for a user configurable number of days/years.  When implemented in Azure SQL, we did it as a post processing job after each child simulation was completed.  

I think a CDF transformation might work well for us.  I saw where a transformation can be scheduled or manually triggered via the API. But, is there any type of configurable eventing for this where something like a data change event in CDF could trigger the transformation?

Reply