Solved

Timestamp for first datapoint in timeseries? [Community Contributed]

1 year ago
June 26, 2023
7 replies
144 views

Olav Alstad
Committed
4 replies

I have many thousand timeseries where I need to find the date of the first datapoints in each series.

For each timeseries, I have no idea if the first datapoint is from this year, or from 20 years ago. Fetching data for several decades is not effective.

Any ideas how to get the first datapoint? Getting the last datapoint would also be great.

Thanks!

Best answer by Håkon V. Treider

I’d like to add a little to @Johannes Hovda answer:

1) Start/end

Be careful using start=0 (1970-01-01) as the time series API support timestamps all the way back to the year 1900. Following the examples in the documentation, you may import, for your convenience, the very first possible timestamp to be dead sure not to miss anything!

The oppositely is true for end, which defaults to "now" if not specified. Thus, if a time series first datapoint lies into the future, you need to specify end to retrieve it. API supports up to 2099-12-31 23:59:59.999

>>> from cognite.client.utils import MIN_TIMESTAMP_MS, MAX_TIMESTAMP_MS
>>> dps_backup = client.time_series.data.retrieve(
...     id=123,
...     start=MIN_TIMESTAMP_MS,
...     end=MAX_TIMESTAMP_MS + 1)  # end is exclusive

2) Efficient queries

You write that you have “thousands of time series” that you need to find the initial datapoint of and this could be very inefficient to query, if not done correctly.

The time series API allows for datapoints to be requested for up to 100 different time series in the same request. The Python SDK will combine your datapoint queries automatically if you ask for them all in one go:

>>> initial_dps = client.time_series.data.retrieve(
...     start=MIN_TIMESTAMP_MS,
...     end=MAX_TIMESTAMP_MS + 1,  # end is exclusive
...     external_id=a_few_thousand_ids,
...     limit=1)

3) ...for other readers with very few time series

If you have a TimeSeries object fetched from CDF, you can simply all the .first() method on it:

>>> ts = client.time_series.retrieve(id=123)
>>> first_dp = ts.first()

View original

Did this topic help you find an answer to your question?

Gaetan Helness
MVP
80 replies
1 year ago
June 26, 2023

Hi. We have a specific endpoint to get the latest datapoint, so that should be no problem.

However, getting the first one might be trickier, as you mentioned, you might need to get all datapoints and get the first one from the list which is far from ideal indeed.

I will check with our engineering team if there is a better solution.

Johannes Hovda
Practitioner
13 replies
1 year ago
June 26, 2023

Hi,

We have a built-in method for getting the latest datapoint.

However, you can use the “limit” option to get only the first datapoint from a series. Doing it this way means you don't need to retrieve the entire series just to get the first point.

You can use this to get Last:

client.time_series.data.retrieve_latest(id=186538285190435)

You can use this to get First:

client.time_series.data.retrieve(id=186538285190435, start=0, end="now", limit=1)

Olav Alstad
Author
Committed
4 replies
1 year ago
June 26, 2023

Thank you both! Looks like getting the last is straight forward. Will try limit=1 for getting the first one. Belive this will do the job, so will test it out soon. Thanks!

Gaetan Helness
MVP
80 replies
1 year ago
June 26, 2023

Yes, great suggestion @Johannes Hovda , this should work as expected :)
Let us know how it goes @Olav Alstad

Håkon V. Treider
Practitioner
81 replies
Answer
1 year ago
June 26, 2023

I’d like to add a little to @Johannes Hovda answer:

1) Start/end

>>> from cognite.client.utils import MIN_TIMESTAMP_MS, MAX_TIMESTAMP_MS
>>> dps_backup = client.time_series.data.retrieve(
...     id=123,
...     start=MIN_TIMESTAMP_MS,
...     end=MAX_TIMESTAMP_MS + 1)  # end is exclusive

2) Efficient queries

You write that you have “thousands of time series” that you need to find the initial datapoint of and this could be very inefficient to query, if not done correctly.

>>> initial_dps = client.time_series.data.retrieve(
...     start=MIN_TIMESTAMP_MS,
...     end=MAX_TIMESTAMP_MS + 1,  # end is exclusive
...     external_id=a_few_thousand_ids,
...     limit=1)

3) ...for other readers with very few time series

If you have a TimeSeries object fetched from CDF, you can simply all the .first() method on it:

>>> ts = client.time_series.retrieve(id=123)
>>> first_dp = ts.first()

Johannes Hovda
Practitioner
13 replies
1 year ago
June 26, 2023

Håkon V. Treider wrote:

I’d like to add a little to @Johannes Hovda answer:

1) Start/end

>>> from cognite.client.utils import MIN_TIMESTAMP_MS, MAX_TIMESTAMP_MS
>>> dps_backup = client.time_series.data.retrieve(
...     id=123,
...     start=MIN_TIMESTAMP_MS,
...     end=MAX_TIMESTAMP_MS + 1)  # end is exclusive

2) Efficient queries

You write that you have “thousands of time series” that you need to find the initial datapoint of and this could be very inefficient to query, if not done correctly.

>>> initial_dps = client.time_series.data.retrieve(
...     start=MIN_TIMESTAMP_MS,
...     end=MAX_TIMESTAMP_MS + 1,  # end is exclusive
...     external_id=a_few_thousand_ids,
...     limit=1)

3) ...for other readers with very few time series

If you have a TimeSeries object fetched from CDF, you can simply all the .first() method on it:

>>> ts = client.time_series.retrieve(id=123)
>>> first_dp = ts.first()

Awesome info, thanks for sharing. Very useful.

Dilini Fernando
Seasoned Practitioner
671 replies
1 year ago
July 7, 2023

Hi @Olav Alstad,

We appreciate your contribution to our community hub! We have chosen to move your article to our hub's How-To section as it will greatly benefit other members of our community. Thank you for your understanding, and we look forward to seeing more great contributions from you in the future!

Best regards,
Dilini

Reply

Cookie Policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos

1) Start/end

2) Efficient queries

3) ...for other readers with very few time series

1) Start/end

2) Efficient queries

3) ...for other readers with very few time series

1) Start/end

2) Efficient queries

3) ...for other readers with very few time series

Reply

Related topics

Is it possible to raise req/sec rate to CDF?icon

Push data from Sesam.io platform into Cognite CDF Raw in 5 minutes

Is it possible to run Cognite Spark Data Source locally?icon

Is it possible to specify whether time series data is aggregated at "start", "midpoint", or "end"?

Cognite's role in the robotics ecosystem

Sign up

Log in to the community

Scanning file for viruses.

This file cannot be downloaded

Cookie Policy

Cookie settings