Skip to main content

In some AI demos built on top of Cognite, our approach to real-time inference involves retrieving the latest time series values for the equipment. Since our machine learning models are deployed and running on a cluster outside Cognite, we leverage the SDK to handle real-time inference by retrieving the most recent data as follows:

real_time_data_bomb_hfx = cdf_client.time_series.retrieve_latest(id="bomt_hfx_time_series_id")

Is anyone working with a different approach to send new data for inference to their machine learning models? I'd be interested in hearing how others are managing this process.

We are also exploring more resilient approaches, such as:

  • Streaming Data for Real-Time Inference (Event-Driven Approach): We plan to test Cognite's Kafka extractor as soon as possible to enable more seamless streaming.

  • On-Demand Inference via API: While this approach is synchronous, which has caused challenges for us in the past, we prefer to avoid this method and lean toward more asynchronous solutions.

Have you looked into the Datapoints Subscription service? Perhaps that could be relevant, if you need to ensure all datapoints are captured.

When you use “retrieve_latest”, you will only get the very last datapoint each time you query - a snapshot - you will not get the rest of the datapoints in the timerange between each time you query.
For instance, if you query with “retrieve_latest” every 5 seconds, you will only get the single, latest datapoint as of 00, 05, 10, 15, and so on. With subscriptions, you get the entire stream of datapoints since last retrieval.

 

https://api-docs.cognite.com/20230101/tag/Data-point-subscriptions

https://cognite-sdk-python.readthedocs-hosted.com/en/latest/time_series.html#datapoint-subscriptions

 

 


Reply