Question

Cognite Functions: No-downtime rollout?

  • 7 June 2024
  • 1 reply
  • 19 views

Hi!

I have a question about how a redeployment of a Cognite Function works. Lets say we have the following scenario:

  • A python backend fetches data from CDF in a pre-defined time window, cleans, and transforms it. Then calculates KPI timeseries.
  • A deployed Cognite Function runs the backend on a schedule every 10 minutes. The calculated KPI time series is appended to an existing Time Series object in CDF. I.e., the Time Series is updated with KPIs for the last 10 minutes every 10 minutes.
  • A frontend reads data from the regularly updated Time Series in CDF and visualizes it in various ways.

This workflow goes on normally for quite some time, until we have developed new functionality in our backend. We want to redeploy the Cognite Function with the new functionality. What happens now?

 

  • Do we have to delete the old function before creating the new one? This may lead to downtime if the deployment for some reason takes so long that we move into the next time window for the the data fetch. Our Time Series in CDF may then have holes of missing data due to the downtime.

 

  • Can we redeploy with the same ID and get some kind of no-downtime rollout of the new function? I assume the CF lives as a pod in a kubernetes cluster, which should be able to deliver these kinds of rollouts, but I cannot find anything in the documentation about this.

 

  • Would we have to implement some kind of rollout scheme ourselves in the handle function to reduce the chance of downtime? For example by deploying the new function alongside the old (different id), monitor the health by making sure it actually gets deployed and passes some kind of default health test, and only then do we kill the old one? We would have to add some kind of random string to the new function external_id to distinguish between the old and new.

Perhaps someone can provide helpful thoughts on how the deployments take place under the hood, and provide best practices for solutions that require some degree of robustness and reliability?

 

Thanks in advance and have a great weekend!

Anders


1 reply

Userlevel 3

Hey @Anders Brakestad! I believe your assessment is generally right. Cognite Functions are serverless functions, similar to Azure Functions, AWS lambda, … So a "no-downtime" redeployment would have to be orchestrated outside of CDF.

 

However, I recommend you look into the relatively new time series data point subscriptions to make your function less sensitive to execution timing, if your data comes from Time Series in CDF. This is a good idea in general, as the exact timing of function runs is not guaranteed.

 

Similarly, you can use the /sync endpoint in the data modeling service to see changes since last query, if the input data to your KPIs come from Data Modeling. 

Reply