Skip to main content
Solved

Timestamp for first datapoint in timeseries? [Community Contributed]


Olav Alstad
Committed

I have many thousand timeseries where I need to find the date of the first datapoints in each series. 

For each timeseries, I have no idea if the first datapoint is from this year, or from 20 years ago. Fetching data for several decades is not effective. 

 

Any ideas how to get the first datapoint? Getting the last datapoint would also be great. 

 

Thanks!

Best answer by Håkon V. Treider

I’d like to add a little to @Johannes Hovda answer:

1) Start/end

Be careful using start=0 (1970-01-01) as the time series API support timestamps all the way back to the year 1900. Following the examples in the documentation, you may import, for your convenience, the very first possible timestamp to be dead sure not to miss anything!

The oppositely is true for end, which defaults to "now" if not specified. Thus, if a time series first datapoint lies into the future, you need to specify end to retrieve it. API supports up to 2099-12-31 23:59:59.999

>>> from cognite.client.utils import MIN_TIMESTAMP_MS, MAX_TIMESTAMP_MS
>>> dps_backup = client.time_series.data.retrieve(
...     id=123,
...     start=MIN_TIMESTAMP_MS,
...     end=MAX_TIMESTAMP_MS + 1)  # end is exclusive

2) Efficient queries

You write that you have “thousands of time series” that you need to find the initial datapoint of and this could be very inefficient to query, if not done correctly.

The time series API allows for datapoints to be requested for up to 100 different time series in the same request. The Python SDK will combine your datapoint queries automatically if you ask for them all in one go:

>>> initial_dps = client.time_series.data.retrieve(
...     start=MIN_TIMESTAMP_MS,
...     end=MAX_TIMESTAMP_MS + 1,  # end is exclusive
...     external_id=a_few_thousand_ids,
...     limit=1)

 

3) ...for other readers with very few time series

If you have a TimeSeries object fetched from CDF, you can simply all the .first() method on it:

>>> ts = client.time_series.retrieve(id=123)
>>> first_dp = ts.first()

 

View original
Did this topic help you find an answer to your question?

7 replies

Forum|alt.badge.img

Hi. We have a specific endpoint to get the latest datapoint, so that should be no problem.

However, getting the first one might be trickier, as you mentioned, you might need to get all datapoints and get the first one from the list which is far from ideal indeed. 

I will check with our engineering team if there is a better solution.


Johannes Hovda
Practitioner
Forum|alt.badge.img+3

Hi,

We have a built-in method for getting the latest datapoint.

However, you can use the “limit” option to get only the first datapoint from a series. Doing it this way means you don't need to retrieve the entire series just to get the first point.

 

You can use this to get Last:

client.time_series.data.retrieve_latest(id=186538285190435)

 

You can use this to get First:

client.time_series.data.retrieve(id=186538285190435, start=0, end="now", limit=1)

 

 

 


Olav Alstad
Committed
  • Author
  • Committed
  • 4 replies
  • June 26, 2023

Thank you both! Looks like getting the last is straight forward. Will try limit=1 for getting the first one. Belive this will do the job, so will test it out soon. Thanks!


Forum|alt.badge.img

Yes, great suggestion @Johannes Hovda , this should work as expected :) 
Let us know how it goes @Olav Alstad 


Forum|alt.badge.img

I’d like to add a little to @Johannes Hovda answer:

1) Start/end

Be careful using start=0 (1970-01-01) as the time series API support timestamps all the way back to the year 1900. Following the examples in the documentation, you may import, for your convenience, the very first possible timestamp to be dead sure not to miss anything!

The oppositely is true for end, which defaults to "now" if not specified. Thus, if a time series first datapoint lies into the future, you need to specify end to retrieve it. API supports up to 2099-12-31 23:59:59.999

>>> from cognite.client.utils import MIN_TIMESTAMP_MS, MAX_TIMESTAMP_MS
>>> dps_backup = client.time_series.data.retrieve(
...     id=123,
...     start=MIN_TIMESTAMP_MS,
...     end=MAX_TIMESTAMP_MS + 1)  # end is exclusive

2) Efficient queries

You write that you have “thousands of time series” that you need to find the initial datapoint of and this could be very inefficient to query, if not done correctly.

The time series API allows for datapoints to be requested for up to 100 different time series in the same request. The Python SDK will combine your datapoint queries automatically if you ask for them all in one go:

>>> initial_dps = client.time_series.data.retrieve(
...     start=MIN_TIMESTAMP_MS,
...     end=MAX_TIMESTAMP_MS + 1,  # end is exclusive
...     external_id=a_few_thousand_ids,
...     limit=1)

 

3) ...for other readers with very few time series

If you have a TimeSeries object fetched from CDF, you can simply all the .first() method on it:

>>> ts = client.time_series.retrieve(id=123)
>>> first_dp = ts.first()

 


Johannes Hovda
Practitioner
Forum|alt.badge.img+3
Håkon V. Treider wrote:

I’d like to add a little to @Johannes Hovda answer:

1) Start/end

Be careful using start=0 (1970-01-01) as the time series API support timestamps all the way back to the year 1900. Following the examples in the documentation, you may import, for your convenience, the very first possible timestamp to be dead sure not to miss anything!

The oppositely is true for end, which defaults to "now" if not specified. Thus, if a time series first datapoint lies into the future, you need to specify end to retrieve it. API supports up to 2099-12-31 23:59:59.999

>>> from cognite.client.utils import MIN_TIMESTAMP_MS, MAX_TIMESTAMP_MS
>>> dps_backup = client.time_series.data.retrieve(
...     id=123,
...     start=MIN_TIMESTAMP_MS,
...     end=MAX_TIMESTAMP_MS + 1)  # end is exclusive

2) Efficient queries

You write that you have “thousands of time series” that you need to find the initial datapoint of and this could be very inefficient to query, if not done correctly.

The time series API allows for datapoints to be requested for up to 100 different time series in the same request. The Python SDK will combine your datapoint queries automatically if you ask for them all in one go:

>>> initial_dps = client.time_series.data.retrieve(
...     start=MIN_TIMESTAMP_MS,
...     end=MAX_TIMESTAMP_MS + 1,  # end is exclusive
...     external_id=a_few_thousand_ids,
...     limit=1)

 

3) ...for other readers with very few time series

If you have a TimeSeries object fetched from CDF, you can simply all the .first() method on it:

>>> ts = client.time_series.retrieve(id=123)
>>> first_dp = ts.first()

 

Awesome info, thanks for sharing. Very useful. 


Dilini Fernando
Seasoned Practitioner
Forum|alt.badge.img+2

Hi @Olav Alstad,

We appreciate your contribution to our community hub! We have chosen to move your article to our hub's How-To section as it will greatly benefit other members of our community. Thank you for your understanding, and we look forward to seeing more great contributions from you in the future! 

Best regards,
Dilini 

 


Reply


Cookie Policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie Settings