Skip to main content
Question

Are there limitations in concurrent deployment of Cognite Functions via Github Workflow Matrix

  • October 25, 2024
  • 1 reply
  • 19 views

Anders Brakestad
Seasoned

Backgorund

We have an ongoing project where we need to deploy Cognite Functions and other resources (TimeSeries and Sequences) regularly, both during development/debugging and when publishing new versions of our product.

The way we have solved this today is this:

  1. We maintain a “deployment template” YAML file that defines all the “stuff” that needs to be deleted/created during a redeployment. This includes Cognite Functions, TimeSeries, and Sequences. Each entry in the YAML contains all necessary data for creating the relevant resource.
  2. We have written some Python classes that perform resource specific deployment (CogniteFunctionDeployer, TimeSeriesDeployer, SequenceDeployer). These behave quite similarly, with methods that backup data, delete, and recreate the resource.
  3. These classes are instantiated in our “deploy.py” script, which just itereates over all the resources defined in the template (step 1), and calls the “.deploy” method on each resource deployer class.
  4. The deploy script is called from within a Github Workflow “deployment.yaml”. Here we define a matrix strategy that makes sure we can trigger the deployment of all functions and resources for all assets at the same time. (note that we have logic that guarantees that no resources is attempted deployed by two independent calls of the deploy script. So this would not be the cause of our observed errors.)

This is basically our Github Action file:

name: Deployment

on:
  workflow_dispatch:
    inputs:
      log_level:
        type: choice
        description: 'Log level for deployment.'
        default: 'DEBUG'
        options:
          - 'DEBUG'
          - 'INFO'
          - 'WARNING'
jobs:
  sandbox_deployment:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        asset:
          - "asset1"
          - "asset2"
        function:
          - "func-1"
          - "func-2"
          - "func-3"
          - "func-4"
    environment: sandbox
    env:
      CDF_CLIENT_ID: ${{ secrets.CDF_CLIENT_ID }}
      CDF_CLIENT_SECRET: ${{ secrets.CDF_CLIENT_SECRET }}
      CDF_TENANT_ID: ${{ secrets.CDF_TENANT_ID }}
      TRIGGER_BRANCH_NAME: ${{ github.ref }}
      GITHUB_ACTOR: ${{ github.actor }}
      GITHUB_SHA: ${{ github.sha }}
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install Poetry
        run: pip3 install poetry

      - name: Install dependencies
        run: |
          poetry config virtualenvs.create false
          poetry install

      - name: Generate requirements.txt
        run: |
          poetry export -f requirements.txt --output requirements.txt --without-hashes

      - name: Deploy ${{ matrix.asset }} to sandbox
        run: |
          python -u scripts/deploy.py sandbox ${{ matrix.asset }} ${{ matrix.function }} \
            --log_level=${{ inputs.log_level }}

Doing it this way also allows us to easily “deactivate” certain assets/functions during development and debugging.

 

The Problem

However, we have noticed lately that some of the Cognite Function deployments fail, with the error in CDF 

Function deployment failed unexpectedly. Please try again and contact Cognite Support if the problem persists.

The problem has persisted, and so here I am :) 

 

Testing / Debugging

I want to avoid a “code dump” and expect someone to find my problem, but it is a bit difficult to reproduce it. However, I have gone through the following test sequence by commenting out various parts of the matrix strategy:

# THIS WORKS
strategy:
  matrix:
    asset:
      - "asset1"
    function:
      - "func-1"

# THIS WORKS
strategy:
  matrix:
    asset:
      - "asset1"
    function:
      - "func-1"
      - "func-2"

# THIS WORKS
strategy:
  matrix:
    asset:
      - "asset2"
    function:
      - "func-1"

# THIS WORKS
strategy:
  matrix:
    asset:
      - "asset2"
    function:
      - "func-1"
      - "func-2"

# THIS WORKS
strategy:
  matrix:
    asset:
      - "asset1"
    function:
      - "func-1"
      - "func-2"
      - "func-3"
      - "func-4"

# THIS WORKS
strategy:
  matrix:
    asset:
      - "asset2"
    function:
      - "func-1"
      - "func-2"
      - "func-3"
      - "func-4"


# THIS WORKS
strategy:
  matrix:
    asset:
      - "asset1"
      - "asset2"
    function:
      - "func-1"
      - "func-2"


# THIS WORKS
strategy:
  matrix:
    asset:
      - "asset1"
      - "asset2"
    function:
      - "func-1"
      - "func-2"
      - "func-3"

# THIS WORKS
strategy:
  matrix:
    asset:
      - "asset1"
      - "asset2"
    function:
      - "func-1"
      - "func-2"
      - "func-4"


# THIS FAILS
strategy:
  matrix:
    asset:
      - "asset1"
      - "asset2"
    function:
      - "func-1"
      - "func-2"
      - "func-3"
      - "func-4"

 

It is not always the same function that fails. I have tried a lot of combinations, but the errors only arise when I deploy both assets and all four functions.

I should perhaps also say that all calls to the deploy script are completely independent. There is no overlap in the resources they create / delete during the course of deployment. All deployed resources end up in the same dataset.

 

Is there anything in the way the deployments take place under the hood that could explain my observed behavior? Is this kind of matrix strategy deployment discouraged?

 

1 reply

Forum|alt.badge.img

Suggestion: The matrix strategy isn’t super-well suited for function deployments imo. If used, I’d recommend to always use a single worker for all deployments, then holding a semaphore to limit the maximum number of parallel deployments.

Reason: Deploying functions is essentially a worker sitting in a wait-for-success-or-failure loop and there’s no point in paying Github for “basically idling” (if you know what I’m saying).

 


Reply


Cookie Policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie Settings