New feature: Calculations are now running on up to 100k data points


Userlevel 5

As we announced in our last release post, we recently updated our calculations backend to run on individual data points if the total count of data points is less than the predefined maximum limit.

(You can watch the video walkthrough for more information.)

When this improved functionality was released a few weeks ago, this limit was set at the intentionally low value of 10,000 individual data points. We did this to test our infrastructure, gather feedback, and ensure our backend will not crash with these more expensive requests — Thanks to those of you who have provided us with input!

As of today, we have released an update which increases this limit from 10,000 data points → 100,000 data points.

This 10x improvement in performance will help to provide accurate, trustworthy calculation results for larger ranges of time and data. With this new 100k limit, we’ve reached the maximum number of data points we can retrieve from the Cognite Data Fusion Time series API with a single request.

While we do intend to support calculations on significantly higher numbers of data point with future versions of our calculations backend, we are curious about how this improvement performs for all of you.

Please test this for yourselves and send us your feedback. If you notice or experience any issues, questionable results, or significantly higher loading times with your calculations, please reach out to us by leaving a comment below.

In addition to this calculations backend update, we’ve also released various bug fixes and minor UI/UX improvements to the frontend. 

 

What if my calculations are still receiving the “downsampled” warning, but I want/need them to be run on individual data points?

Great! This means you have a use case that we need to hear about. Your case will become a guiding example that we’ll use to benchmark our ongoing product development to enable this functionality.

Please leave a comment below describing your use case (what your chart or calculation is for, what time range you’re looking at, why you need individual data points, etc.) and include the link to your chart, if you have it.

If you’d like to reach out to speak about your case over a quick call, feel free to reach out to me directly eric.stein@cognite.com.


3 replies

Userlevel 3

Eric - Thank you.

We continue to be challenged with this and would request that this data point limit be further increased to at least 20million. 

 

Ibrahim

Userlevel 5

Hi @ibrahim.alsyed and thanks for the quick feedback on this.

I've already discussed this with several of my colleagues. This will definitely serve as a helpful benchmark as we approach further development. 

Obviously, 20 million is a significant increase over 100k data points. While we will definitely increase the limit that our frontend and backend can handle going forward, the best solution for now will be to work with a data scientist or anyone who can use the CDF python SDK.

Then, that person can our InDSL functions to recreate the calculation(s) from Charts, fetch the data in batches to perform the calculation, and schedule it to write back to CDF using Functions. Members of the Charts team and I actually discussed this with @Philippe Bettler for one of the Celanese use cases while he was in Oslo last week.

Of course, we'll work to bring all of this functionality into the UI so anyone — not only people who code python — can do this from start-to-finish in a UI.

I'll be in touch to gather more feedback as we make progress! 

Userlevel 1

@ibrahim.alsyed, I am working with Bredan to support the use case. We had a good discussion today and we are making good progress. We should have resolution next week.

Reply