Hi Niros.
We don’t offer native data push/stream capabilities in CDF yet. It is a capability that we have on our radar, but it is not a part of our (short to medium term) road map.
So you have to look at some variation of setting up an agent that polls data from CDF, keep a watermark, and push to your destination. When polling data from CDF, it is worth noting the following:
- The time series API is eventually consistent. That is, there may be a small delay from a data point is published to CDF until is queriable. Also, CDF does not guarantee that the data points become available in a sorted order. The consequence of this is that you probably want to let the data settle for a few seconds before you query for it. That is, you client should implement a “polling offset” where you query for a time window “t - <polling offset>”.
- If your data source publish historic data points then CDF does not have a good way of communicating that to you--there is no “last updated time” on the data points themselves. So, if your source is capable of historical updates then you need balance the requirement of completeness (capturing all data points updates) with cost (the complexity of capturing historical changes).
Unfortunately there is not a very simple recipe for this.
We will implement experimental support for streaming (including data points) in the community Java SDK (https://github.com/cognitedata/cdf-sdk-java) within the next month. That could probably give you some pointers on how to implement a similar agent in Python.