Solved

Cognite Functions: No-downtime rollout?


Hi!

I have a question about how a redeployment of a Cognite Function works. Lets say we have the following scenario:

  • A python backend fetches data from CDF in a pre-defined time window, cleans, and transforms it. Then calculates KPI timeseries.
  • A deployed Cognite Function runs the backend on a schedule every 10 minutes. The calculated KPI time series is appended to an existing Time Series object in CDF. I.e., the Time Series is updated with KPIs for the last 10 minutes every 10 minutes.
  • A frontend reads data from the regularly updated Time Series in CDF and visualizes it in various ways.

This workflow goes on normally for quite some time, until we have developed new functionality in our backend. We want to redeploy the Cognite Function with the new functionality. What happens now?

 

  • Do we have to delete the old function before creating the new one? This may lead to downtime if the deployment for some reason takes so long that we move into the next time window for the the data fetch. Our Time Series in CDF may then have holes of missing data due to the downtime.

 

  • Can we redeploy with the same ID and get some kind of no-downtime rollout of the new function? I assume the CF lives as a pod in a kubernetes cluster, which should be able to deliver these kinds of rollouts, but I cannot find anything in the documentation about this.

 

  • Would we have to implement some kind of rollout scheme ourselves in the handle function to reduce the chance of downtime? For example by deploying the new function alongside the old (different id), monitor the health by making sure it actually gets deployed and passes some kind of default health test, and only then do we kill the old one? We would have to add some kind of random string to the new function external_id to distinguish between the old and new.

Perhaps someone can provide helpful thoughts on how the deployments take place under the hood, and provide best practices for solutions that require some degree of robustness and reliability?

 

Thanks in advance and have a great weekend!

Anders

icon

Best answer by Dilini Fernando 19 July 2024, 12:25

View original

4 replies

Userlevel 1

@Anders Brakestad 

If your function changes time series, and re-running the program between the 10 mins won’t affect things negatively, you can always deploy the new function with a different external_id and delete the old function once the new one pops up as deployed.

You could also upload the function without a schedule. Once the function is deployed, run it manually (via web interface or the API). Onnce you are satisfied with the run, then add a scedule and delete the old function. 

All you need is a different external_id than that of the old function.

Userlevel 4
Badge +2

Hi @Anders Brakestad,

I hope the above information was helpful. I am closing this topic for now. Please don't hesitate to start a new post if you have any further questions. 

Userlevel 4
Badge +2

Hi @Anders Brakestad,

 Did the above information was helpful?

Userlevel 3

Hey @Anders Brakestad! I believe your assessment is generally right. Cognite Functions are serverless functions, similar to Azure Functions, AWS lambda, … So a "no-downtime" redeployment would have to be orchestrated outside of CDF.

 

However, I recommend you look into the relatively new time series data point subscriptions to make your function less sensitive to execution timing, if your data comes from Time Series in CDF. This is a good idea in general, as the exact timing of function runs is not guaranteed.

 

Similarly, you can use the /sync endpoint in the data modeling service to see changes since last query, if the input data to your KPIs come from Data Modeling. 

Reply