Skip to main content
Gathering Interest

Record of "when was the last datapoint ingested?"

Related products:Data Quality
  • January 16, 2025
  • 3 replies
  • 58 views
Anita Hæhre
  • Anita Hæhre
    Anita Hæhre

Currently, CDF UI displays information about the timestamp of last datapoint ingested under the label “Last Reading”.

For example:

I have following timeseries, with “Last reading” as 16 days ago:

 

 

Now I add a datapoint :
 

timestamp: 2025-01-05 00:00:00 (UTC)

value:20

The “Last Reading” gets updated to 12 days ago; denoting the timestamp of last datapoint ingested.

 

 

Additionally a feature of displaying when was this datapoint added will be useful because I can know when was this datapoint added to the timeseries for investigating failures or debugging. 

Its similar to the “Updated at” which records when any metadata/properties of that timeseries is changed. BUT this information will be for datapoints. 

3 replies

  • Practitioner
  • 11 replies
  • January 16, 2025

Hi ​@Vaibhav Narain and thank you for taking the time to submit this product idea.

I’ve notified the Product Manager responsible for Time Series and OT, and expect them to either reach out directly, or connect with you through this thread should they have any questions or any further follow-up


Forum|alt.badge.img

We do actually store the «written to CDF» timestamp for a datapoint (we can’t reliably know how long the the source system held the datapoint before we received it in the Time Series API - time zone differences, backfill vs live stream, etc).

 

However, with the potential time granularity of a specific datapoint in, we’re still thinking about how to best present this data in a useful manner.
I’m thinking king about semi-existential questions like;  does it really make sense to see the millisecond a single ms datapoint was recorded, or is the person troubleshooting the data flow more interested in the second (or maybe range of seconds) where a set of datapoints were recorded? If the latter, what is the appropriate value to use for all datapoints within that range; average timestamp across the range (risky from a precision perspective since outliers will get «smushed»). Perhaps min and max for the range? The higher level time window the range represents (datetime value at a per second granularity for the start and end of the range?), etc.

 

Looking for input and ideas, please!   


Aditya Kotiyal
MVP
Forum|alt.badge.img+3

@Thomas Sjølshagen , ​@Vaibhav Narain  My 2 cents:

 

  • I agree with Thomas that capturing this for data streaming with a higher frequency does not make a lot of sense and will not be valuable.
  • At the same time, Vaibhav’s use case of troubleshooting and auditing is exactly what I have faced as a challenge during my deployment days. ​@Thomas Sjølshagen I believe a common ground here can be to capture this metrc for a timeseries object every 30 mins to 1 hour. For example: if a object is getting data points every minutes, there will be 60 data points, but we should only check it after every 60 minutes and add the timestamp of the last ingested data point. This will really help a data engineer in troubleshooting issue pertaining to data source connections, ownership and even liability.

Reply


Cookie Policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie Settings