Solved

Timestamp for first datapoint in timeseries? [Community Contributed]

  • 26 June 2023
  • 7 replies
  • 133 views

I have many thousand timeseries where I need to find the date of the first datapoints in each series. 

For each timeseries, I have no idea if the first datapoint is from this year, or from 20 years ago. Fetching data for several decades is not effective. 

 

Any ideas how to get the first datapoint? Getting the last datapoint would also be great. 

 

Thanks!

icon

Best answer by Håkon V. Treider 26 June 2023, 21:04

View original

7 replies

Userlevel 3
Badge

Hi. We have a specific endpoint to get the latest datapoint, so that should be no problem.

However, getting the first one might be trickier, as you mentioned, you might need to get all datapoints and get the first one from the list which is far from ideal indeed. 

I will check with our engineering team if there is a better solution.

Userlevel 2
Badge +3

Hi,

We have a built-in method for getting the latest datapoint.

However, you can use the “limit” option to get only the first datapoint from a series. Doing it this way means you don't need to retrieve the entire series just to get the first point.

 

You can use this to get Last:

client.time_series.data.retrieve_latest(id=186538285190435)

 

You can use this to get First:

client.time_series.data.retrieve(id=186538285190435, start=0, end="now", limit=1)

 

 

 

Thank you both! Looks like getting the last is straight forward. Will try limit=1 for getting the first one. Belive this will do the job, so will test it out soon. Thanks!

Userlevel 3
Badge

Yes, great suggestion @Johannes Hovda , this should work as expected :) 
Let us know how it goes @Olav Alstad 

Userlevel 4
Badge

I’d like to add a little to @Johannes Hovda answer:

1) Start/end

Be careful using start=0 (1970-01-01) as the time series API support timestamps all the way back to the year 1900. Following the examples in the documentation, you may import, for your convenience, the very first possible timestamp to be dead sure not to miss anything!

The oppositely is true for end, which defaults to "now" if not specified. Thus, if a time series first datapoint lies into the future, you need to specify end to retrieve it. API supports up to 2099-12-31 23:59:59.999

>>> from cognite.client.utils import MIN_TIMESTAMP_MS, MAX_TIMESTAMP_MS
>>> dps_backup = client.time_series.data.retrieve(
... id=123,
... start=MIN_TIMESTAMP_MS,
... end=MAX_TIMESTAMP_MS + 1) # end is exclusive

2) Efficient queries

You write that you have “thousands of time series” that you need to find the initial datapoint of and this could be very inefficient to query, if not done correctly.

The time series API allows for datapoints to be requested for up to 100 different time series in the same request. The Python SDK will combine your datapoint queries automatically if you ask for them all in one go:

>>> initial_dps = client.time_series.data.retrieve(
...     start=MIN_TIMESTAMP_MS,
...     end=MAX_TIMESTAMP_MS + 1,  # end is exclusive
...     external_id=a_few_thousand_ids,
...     limit=1)

 

3) ...for other readers with very few time series

If you have a TimeSeries object fetched from CDF, you can simply all the .first() method on it:

>>> ts = client.time_series.retrieve(id=123)
>>> first_dp = ts.first()

 

Userlevel 2
Badge +3

I’d like to add a little to @Johannes Hovda answer:

1) Start/end

Be careful using start=0 (1970-01-01) as the time series API support timestamps all the way back to the year 1900. Following the examples in the documentation, you may import, for your convenience, the very first possible timestamp to be dead sure not to miss anything!

The oppositely is true for end, which defaults to "now" if not specified. Thus, if a time series first datapoint lies into the future, you need to specify end to retrieve it. API supports up to 2099-12-31 23:59:59.999

>>> from cognite.client.utils import MIN_TIMESTAMP_MS, MAX_TIMESTAMP_MS
>>> dps_backup = client.time_series.data.retrieve(
... id=123,
... start=MIN_TIMESTAMP_MS,
... end=MAX_TIMESTAMP_MS + 1) # end is exclusive

2) Efficient queries

You write that you have “thousands of time series” that you need to find the initial datapoint of and this could be very inefficient to query, if not done correctly.

The time series API allows for datapoints to be requested for up to 100 different time series in the same request. The Python SDK will combine your datapoint queries automatically if you ask for them all in one go:

>>> initial_dps = client.time_series.data.retrieve(
...     start=MIN_TIMESTAMP_MS,
...     end=MAX_TIMESTAMP_MS + 1,  # end is exclusive
...     external_id=a_few_thousand_ids,
...     limit=1)

 

3) ...for other readers with very few time series

If you have a TimeSeries object fetched from CDF, you can simply all the .first() method on it:

>>> ts = client.time_series.retrieve(id=123)
>>> first_dp = ts.first()

 

Awesome info, thanks for sharing. Very useful. 

Userlevel 4
Badge +2

Hi @Olav Alstad,

We appreciate your contribution to our community hub! We have chosen to move your article to our hub's How-To section as it will greatly benefit other members of our community. Thank you for your understanding, and we look forward to seeing more great contributions from you in the future! 

Best regards,
Dilini 

 

Reply