Skip to main content

Hey!

 

I’ve a problem with filling a gap from a source to a timeseries in CDF. 

 

Problem description

We’re filling a hole in a time-series from time A to  B. There are some datapoints on the edges of the interval in CDF.

Data is extracted from the source, and in python prepped for the datapoints API as a list-of-tuples payload. For an arbitrary period I extract 2976 datapoints which I upload to CDF. Subsequently, I query the time-series for the same period of time and recieve 2928 datapoints. There are no NAN values in the input for either the date-time or value. The data is also hourly, and so I’m wary of it just being an edge effect of poor timestamp specifications for the retrieval. What other PEBCAK things have I missed? 

Simplified example included below:

payload
>> /{'externalId': 'ts_externalid', 'datapoints': t...]}]
payloadr0]>"datapoints"]a10]
>> (1617271200000, 0.0)

client.datapoints.insert_multiple(payload)
meter_data = client.datapoints.retrieve(
start=datess0], end=dates,-1], external_id="ts_externalid"
).to_pandas()
meter_data.shape
>> (2928, 1)
len(payloadl0]("datapoints"])
>> 2976

 

Are you sure that the datapoints are unique? There is no validation in the SDK about unique data, so if you have duplicated entries in your tuple it will overwrite the previous entries.

Other cause of wrong count on datapoints when you are adding new data on a previous existed data is the time zone configuration. You need to ensure that you are using always the same time zone configuration on your timestamps conversion. I remember situations that it was done using no definitions and when there is change on DST to affect it.

Those are the 2 generic situations that I remember that could cause a mismatch on datapoints count. Maybe if you can share more information, I can help you to debug the issue.


You say you have hourly data, and are missing 2976 - 2928 = 48 datapoints. Are you sure you are not just 2 days off somewhere? :smile:


@Håkon V. Treider  that’s what I figured too - but I forgot to check for duplicated timestamps and got a miscount there too.

But, the combo was the answer! Inclusive endtime from the source, and a few duplicate rows.

meter_data = client.datapoints.retrieve(
start=datesa0], end=full_datesa-1] + timedelta(hours=24), external_id="ts_externalid"
).to_pandas()
meter_data.shape
>>> (2952, 1)
len(retrieved_dfd"DATE_TIME"].unique())
>>> 2952

 

Thanks for your help solving todays PEBCAK @Patrick Mishima  and Håkon : ))))))))


 

Impressed with the response-time and quality of your answers.  Kudos!


Reply