Is it possible to specify whether time series data is aggregated at "start", "midpoint", or "end"?
I am making comparisons between time series data in CDF and PI. The reason is that in our tenants the CDF data is not 100% accurate compared to PI.
However, from my testing I think that PI performs its aggregations with the timestamps “centered” at the aggregated time periods, while CDF puts the timestamps at the start of each aggregated period. Is it possible to specify how this is done with the Python API? From my study of the docs it appears not to be the case. The same applies to the PI Web API as well: I cannot specify how the timestamps are placed. The agreement with PI becomes significantly better if I place the CDF timestamps at the center of the aggregated time periods.
My current workaround is the following:
Fetch RAW data from CDF
Shift the timestamps by 0.5x of the granularity
Resample to the desired granularity
Compute mean
Interplate any missing values
The issue is that fetching raw data is a lot more time consuming than fetching aggregates. I have been playing with fetching aggregates from CDF and performing the shift after the fact, but this does not lead to as good agreement. I have also played around with speeding up the raw data ferch by chunking the time periods and fetching with multiple threads or processes, but the speedup is not significant.
Here is an example of how I do the CDF raw data fetch in order to get good agreement.
Hi @Anders Brakestad did you get any further on this? Take into consideration that the data in CDF and PI might differ. PI might be doing compression when data is stored to the archive, whilst CDF will suck it directly from the PI snapshot and hence store more datapoints if there are any compression set in PI. So when you do aggregation there might be a small deviation since one would have more granular data in CDF in some cases where there are compression turned on in PI.
Hi Anders, just following up on this. How did your investigations go? Would you mind sharing any insights and learnings you got from the service vs. PI?
Hi @Anders Brakestad,
Have you had the opportunity to proceed with your investigation?
Depending on what your sampling interval is, it may be possible to shift compared to PI. E.g. 1 hour can be replaced by 60 minutes, then with start you can control/shift the sampling period. See image example:
Just a note regarding using .resample(self.sampling_interval).mean().interpolate() , this will compute the simple average (i.e. “sum of values / number of values”), instead of the time-weighted average returned by CDF (as this is a workaround I guess you already know this, but still worth mentioning I think!).
..and a final note, “I have also played around with speeding up the raw data fetch by chunking the time periods and fetching with multiple threads or processes, but the speedup is not significant.”: All the retrieve endpoints for datapoints are quite heavily optimised already, so unless you have multiple credentials to circumvent rate-limiting, your best course of action is to pass all time series you want to fetch in a single call (also, retrieve_arrays has the overall best performance).
Thank you for the quick reply!
Firstly, I was not aware that the CDF aggregation was a time-weighted average, thank you for mentioning this!
If I shift the start time as you showed, I still get a shift in the CDF data features compared to PI. Here is a screenshot:
Shifting the CDF data by 0.5 units the features overlap much better:
However, this leads to the timestamps no longer being equal for the CDF and PI data. It is not a huge issue if I just compute comparison metrics (RMSE, max errors, percentage errors, etc), but it is extra details that need to be documented and explained.
Perhaps there was something I misunderstood by your example. For reference, here is the fetch code:
So I get exactly the same results regardless of whether I shift the start_time parameter in the CDF fetch relative to the one used for PI fetch.
Hi @Anders Brakestad,
I’m the product manager for our Time Series services, and I’m keen to learn any observations, good or otherwise that you find from your evaluation. Would you be available to have a short call with me when you’ve concluded your research?
Kind Regards, Glen
Good morning Glen!
Sure, I’ll be happy to share what I find from my small investigations. I’ll keep in touch!