Nesting of Synthetic Time Series - Combining More than 100 Time Series
Are Synthetic Time Series able to reference other Synthetic Time Series in their formulas specified?
This is the problem I am trying to solve:
Synthetic Time Series is limited to referencing 100 time series
Sometimes a unit/facility/system/functional location/site may have more than 100 direct child assets
I need to calculate a “Synthetic Time Series” that is the daily average of a time series found on all child assets
My thought is that, for 300 assets, I would have:
STS A: Sum for assets 1-100
STS B: Sum for assets 101-200
STS C: Sum for assets 201-300
STS Sum of STS A, B, and C divided by 300
Is there a better way to accomplish something like this in CDF?
Page 1 / 1
Hi @Torgrim Aas,
This query will be run on demand from a facility/unit dashboard. It will run several times a day for each facility or unit. Yes, the output of the calculation/aggregation is a single time series with one data point per day for a user configurable number of days/years. When implemented in Azure SQL, we did it as a post processing job after each child simulation was completed.
I think a CDF transformation might work well for us. I saw where a transformation can be scheduled or manually triggered via the API. But, is there any type of configurable eventing for this where something like a data change event in CDF could trigger the transformation?
Hi Ben,
How often will this be done? If it is a daily average, then it is only one datapoint per time series you are loading, right? Or is this considered a post processing of a simulation? If the latter, then it might be worthwhile to look at if the average should be done on the fly while ingesting the data?
Also, you would not necessarily load all data into memory before doing the calculation. You would have to find a compromise between time due to lots of requests, or memory due to large requests. Another opportunity to consider is to use SparkSQL transformations. It is more efficient for large data sets.
Regards, Torgrim
Hi @Torgrim Aas,
Thank you for the suggestion. The time series we will need to aggregate is data that we are producing ourselves and known to have a consistent start and end date for all children and be at daily resolution. Usually the range is between 5 and 40 years. The average produced would be a new time series with the same start and end as all child data point.
Simple example with 3 ts and 4 days:
If we have 40 years of daily data for 500 time series, this would be ~7,305,000 data points. Are you suggesting that we should be able to load all data point in memory in a Cognite Function, perform the average for each day and then return the result pretty quickly (< 1sec)?
With the maximum memory allocation of 2.5 for a Cognite Function, what would the upper limit of how many data points would fit into memory? We can always chunk it into 5 year increments and make multiple calls if needed.
Thank you,
Ben
Hi Ben
Another way than using synthetic time series is to do a Cognite Function that iterates over all the assets in question. This would allow the average to also adjust for change in asset count if that is of relevance. E.g.
for each asset under a given node: average += asset.timeseriesX.latestDP
return average/count
Something to consider though is that you most often cannot just take a given datapoint as is into an average across different data sources/instruments. Verifying both the data quality and timestamp is important to ensure you are not mixing valid and invalid values to avoid bad output.