Number of datapoints between 2 timestamp without aggregation

Question

Hi experts,

In order to decide whether to use an aggregation strategy (or no aggregation at all), I need to know the number of data points between two timestamps without aggregation (server-side calculation).

I’m looking for this code of call :

client.time_series.data.count(external_id=<str>, start=<int | datetime>, end=<int | datetime>) -> int

I'm trying to avoid having to perform a granularity calculation

Thanks !

Regards,

Pierre

Everton Colling · Accepted Answer

Hi ​@RAMBOURG Pierre!That makes sense, and it's actually the same problem the Charting components in Cognite Data Fusion user interfacesare solving when deciding whether to render raw datapoints or aggregates. Here's the approach I wouldsuggest you can use today:As I mentioned before, instead of a separate count request followed by a data fetching request, fetch the aggregates eagerly (including count), then decide. The count aggregate comes back in the same response as average/min/max, so if a given time series turns out to be dense you already have what you need to plot, with no second request. You only have to doa second request in the sparse case, where you go back for raw data.This would look something like this:threshold = 10000          # threshold to switch between raw and aggregatesnumber_datapoints = 2000   # how many points you want to plot when using aggregates1. Calculate granularity   granularity = (end - start) / number_datapoints2. Fetch aggregates for the time series   aggregates = fetch_agg(start, end, granularity, aggregates=[count, average, min, max])3. Sum count across all buckets   total_count = sum(bucket.count for bucket in aggregates)4. Decide what to render   if total_count > threshold:       render aggregates        # already available, no extra request   else:       raw = fetch_raw(start, end, include_outside_points=True)       render rawWith this approach, you don't fetch thousands of raw points first and then change strategy. The aggregate-first request is cheap and has dual purpose. It both tells you the count and gives you a plottable result if you end up needing aggregates. Raw data is only fetched once you know the series is sparse enough to require going for raw data points.For now you will have to manage this logic yourself, but I agree this is a common enough need for anyone plotting time series datawhich makes it worth considering as a first-class SDK feature.I'll discuss it internally to evaluate adding it to the Python and JavaScript SDKs in the coming future.

Everton Colling · Answer

Hi ​@RAMBOURG Pierre!Today we dont have a dedicated count() method that returns a single value without specifying granularity, but there are ways to compute it client side. Before recommending anything, could you help me understand:What are you ultimately trying to do with the count? For example, are you deciding between plotting raw vs. aggregated data, sizing a request, or something else?Do you need it to be exact, or would an approximationbe enough to make the decision?The reason I am askins is because fetching a countaggregate costs basically the same as fetching count plus the other aggregates, so if the goal is choosing an aggregation strategy, a "count first, then fetch" flow is often slower than just eagerly fetching what you need in parallel and discarding the rest.

Sign up

Welcome to Cognite Hub

Scanning file for viruses.

This file cannot be downloaded