Skip to main content
Solved

Data Streaming with CDF

  • 8 October 2021
  • 1 reply
  • 122 views

Forum|alt.badge.img

I am trying to stream data from CDF to Azure Event Hub with the Python SDK and cannot find anything related to streaming datasets. Only option so far (as I know) is 

dps = c.datapoints.retrieve_latest(id=184691546499795)

which would need a trigger of some sort to keep running. Data from CDF are so called timeseries from different types of sensors. Is there any documentation on Streaming Data for CDF that I could look at or is Streaming really supported?

 

Best answer by Kjetil Halvorsen

Hi Niros.

We don’t offer native data push/stream capabilities in CDF yet. It is a capability that we have on our radar, but it is not a part of our (short to medium term) road map. 

So you have to look at some variation of setting up an agent that polls data from CDF, keep a watermark, and push to your destination. When polling data from CDF, it is worth noting the following:

  • The time series API is eventually consistent. That is, there may be a small delay from a data point is published to CDF until is queriable. Also, CDF does not guarantee that the data points become available in a sorted order. The consequence of this is that you probably want to let the data settle for a few seconds before you query for it. That is, you client should implement a “polling offset” where you query for a time window “t - <polling offset>”.
  • If your data source publish historic data points then CDF does not have a good way of communicating that to you--there is no “last updated time” on the data points themselves. So, if your source is capable of historical updates then you need balance the requirement of completeness (capturing all data points updates) with cost (the complexity of capturing historical changes).

Unfortunately there is not a very simple recipe for this. 

We will implement experimental support for streaming (including data points) in the community Java SDK (https://github.com/cognitedata/cdf-sdk-java) within the next month. That could probably give you some pointers on how to implement a similar agent in Python.

 

View original
Did this topic help you find an answer to your question?

1 reply

Hi Niros.

We don’t offer native data push/stream capabilities in CDF yet. It is a capability that we have on our radar, but it is not a part of our (short to medium term) road map. 

So you have to look at some variation of setting up an agent that polls data from CDF, keep a watermark, and push to your destination. When polling data from CDF, it is worth noting the following:

  • The time series API is eventually consistent. That is, there may be a small delay from a data point is published to CDF until is queriable. Also, CDF does not guarantee that the data points become available in a sorted order. The consequence of this is that you probably want to let the data settle for a few seconds before you query for it. That is, you client should implement a “polling offset” where you query for a time window “t - <polling offset>”.
  • If your data source publish historic data points then CDF does not have a good way of communicating that to you--there is no “last updated time” on the data points themselves. So, if your source is capable of historical updates then you need balance the requirement of completeness (capturing all data points updates) with cost (the complexity of capturing historical changes).

Unfortunately there is not a very simple recipe for this. 

We will implement experimental support for streaming (including data points) in the community Java SDK (https://github.com/cognitedata/cdf-sdk-java) within the next month. That could probably give you some pointers on how to implement a similar agent in Python.

 


Reply


Cookie Policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie Settings