Solved

Retrieving many raw datapoints from many timeseries

  • 8 November 2023
  • 8 replies
  • 105 views

Hi
 

We have a web application that uses the cognite sdk to present timeseries and datapoints.
We have two use cases:
1. We have a case with 500 time series, and we need to retrieve the latest 1000 data points from the combined 500 time series.
2. We have a case with 500 time series, and we need to retrieve all datapoints between two dates.

 

For both cases we also need to know which timeseries each datapoint belongs to and we need raw values and can not use aggregates.
What is the most effective approach to achieve this?

icon

Best answer by Håkon V. Treider 8 November 2023, 13:17

View original

8 replies

Userlevel 4
Badge

When you say “Cognite SDK”, which are you referring to? E.g. there’s JavaScript, Python and more.

We are currently using the C# Sdk. But would be nice to know if it is easier solved in Javascript or Pythobn.

Userlevel 3

Hi Espen,

If it’s possible for you to use either the JavaScript or the Python SDK for retrieval of your data, then I would suggest that as your first option.

These SDKs are the ones that receive the most frequent updates from the Cognite product organisation.  I’ll defer to @Håkon V. Treider on the specific value of the Python SDK, but I suspect this may be the most powerful method for this task.

Userlevel 4
Badge

I only know the Python SDK, but I don’t see how you would easily do (1) without implementing some custom logic yourself since limit only looks forward in time.

For the second question, this is basic functionality and well-supported! Link to relevant part of the documentation here:
https://cognite-sdk-python.readthedocs-hosted.com/en/latest/core_data_model.html#retrieve-datapoints

Let me know if you have any follow-up question 😄

Userlevel 4
Badge +2

Hi @Espen Jacobsen,

We are following up to see whether you're satisfied with the responses you've received.

Best regards,
Dilini 

Sorry, I forgot to write back :)

We actually implemented the functionality we needed before I wrote the question, but we are not really satisfied with the speed. I asked the question with hope that someone had done something similar and might have chosen another approach than us, but I guess it’s not that many ways to do it. Thanks for everyones reposes :)

Userlevel 2

This could be a good use case for data point subscriptions, in that it can give you the latest data across all 500 time series in a single query. It will likely have much higher performance than the regular API.

There are some caveats, however:

  • Subscriptions is in beta. Mainly because we don’t know how it work at scale. However, we believe 500 time series is a small scale in this regard (we have a limit of 10.000 time series in a single subscription)
  • Subscriptions will give you the latest updates (deletes, upserts). Not necessarily the data points with the highest timestamps, if the updates are ingested out of order.
  • Subscription data is deleted after 7 days, after which you need to use the regular API
  • The python SDK support is in alpha

Are the 500 time series constant? In that case, you can specify them by externalId. Otherwise you can specify a search filter, and the time series will be added automatically to the subscription soon after it is created/updated to match the filter (assuming you are below the 10k limit). The subscription will also notify you that a time series has been added/removed, in case you want to handle this.

If you decide to try this out, please reach out, maybe to @Glen Sykes who can coordinate (I’m only occasionally checking the forums). That way, we can provide you with the optimal support, and you can provide us with valuable feedback on the feature!

Thank you. We will check out subscriptions :)

Reply