Are there limitations in concurrent deployment of Cognite Functions via Github Workflow Matrix

Question

BackgorundWe have an ongoing project where we need to deploy Cognite Functions and other resources (TimeSeries and Sequences) regularly, both during development/debugging and when publishing new versions of our product.The way we have solved this today is this:We maintain a “deployment template” YAML file that defines all the “stuff” that needs to be deleted/created during a redeployment. This includes Cognite Functions, TimeSeries, and Sequences. Each entry in the YAML contains all necessary data for creating the relevant resource.	We have written some Python classes that perform resource specific deployment (CogniteFunctionDeployer, TimeSeriesDeployer, SequenceDeployer). These behave quite similarly, with methods that backup data, delete, and recreate the resource.	These classes are instantiated in our “deploy.py” script, which just itereates over all the resources defined in the template (step 1), and calls the “.deploy” method on each resource deployer class.	The deploy script is called from within a Github Workflow “deployment.yaml”. Here we define a matrix strategy that makes sure we can trigger the deployment of all functions and resources for all assets at the same time. (note that we have logic that guarantees that no resources is attempted deployed by two independent calls of the deploy script. So this would not be the cause of our observed errors.)This is basically our Github Action file:name: Deploymenton:  workflow_dispatch:    inputs:      log_level:        type: choice        description: 'Log level for deployment.'        default: 'DEBUG'        options:          - 'DEBUG'          - 'INFO'          - 'WARNING'jobs:  sandbox_deployment:    runs-on: ubuntu-latest    strategy:      matrix:        asset:          - "asset1"          - "asset2"        function:          - "func-1"          - "func-2"          - "func-3"          - "func-4"    environment: sandbox    env:      CDF_CLIENT_ID: ${{ secrets.CDF_CLIENT_ID }}      CDF_CLIENT_SECRET: ${{ secrets.CDF_CLIENT_SECRET }}      CDF_TENANT_ID: ${{ secrets.CDF_TENANT_ID }}      TRIGGER_BRANCH_NAME: ${{ github.ref }}      GITHUB_ACTOR: ${{ github.actor }}      GITHUB_SHA: ${{ github.sha }}    steps:      - name: Checkout repository        uses: actions/checkout@v4      - name: Set up Python        uses: actions/setup-python@v5        with:          python-version: "3.11"      - name: Install Poetry        run: pip3 install poetry      - name: Install dependencies        run: |          poetry config virtualenvs.create false          poetry install      - name: Generate requirements.txt        run: |          poetry export -f requirements.txt --output requirements.txt --without-hashes      - name: Deploy ${{ matrix.asset }} to sandbox        run: |          python -u scripts/deploy.py sandbox ${{ matrix.asset }} ${{ matrix.function }} \            --log_level=${{ inputs.log_level }}Doing it this way also allows us to easily “deactivate” certain assets/functions during development and debugging. The ProblemHowever, we have noticed lately that some of the Cognite Function deployments fail, with the error in CDF Function deployment failed unexpectedly. Please try again and contact Cognite Support if the problem persists.The problem has persisted, and so here I am :)  Testing / DebuggingI want to avoid a “code dump” and expect someone to find my problem, but it is a bit difficult to reproduce it. However, I have gone through the following test sequence by commenting out various parts of the matrix strategy:# THIS WORKSstrategy:  matrix:    asset:      - "asset1"    function:      - "func-1"# THIS WORKSstrategy:  matrix:    asset:      - "asset1"    function:      - "func-1"      - "func-2"# THIS WORKSstrategy:  matrix:    asset:      - "asset2"    function:      - "func-1"# THIS WORKSstrategy:  matrix:    asset:      - "asset2"    function:      - "func-1"      - "func-2"# THIS WORKSstrategy:  matrix:    asset:      - "asset1"    function:      - "func-1"      - "func-2"      - "func-3"      - "func-4"# THIS WORKSstrategy:  matrix:    asset:      - "asset2"    function:      - "func-1"      - "func-2"      - "func-3"      - "func-4"# THIS WORKSstrategy:  matrix:    asset:      - "asset1"      - "asset2"    function:      - "func-1"      - "func-2"# THIS WORKSstrategy:  matrix:    asset:      - "asset1"      - "asset2"    function:      - "func-1"      - "func-2"      - "func-3"# THIS WORKSstrategy:  matrix:    asset:      - "asset1"      - "asset2"    function:      - "func-1"      - "func-2"      - "func-4"# THIS FAILSstrategy:  matrix:    asset:      - "asset1"      - "asset2"    function:      - "func-1"      - "func-2"      - "func-3"      - "func-4" It is not always the same function that fails. I have tried a lot of combinations, but the errors only arise when I deploy both assets and all four functions.I should perhaps also say that all calls to the deploy script are completely independent. There is no overlap in the resources they create / delete during the course of deployment. All deployed resources end up in the same dataset. Is there anything in the way the deployments take place under the hood that could explain my observed behavior? Is this kind of matrix strategy deployment discouraged?

Håkon V. Treider · Answer

Suggestion: The matrix strategy isn’t super-well suited for function deployments imo. If used, I’d recommend to always use a single worker for all deployments, then holding a semaphore to limit the maximum number of parallel deployments.

Reason: Deploying functions is essentially a worker sitting in a wait-for-success-or-failure loop and there’s no point in paying Github for “basically idling” (if you know what I’m saying).

Backgorund

The Problem

Testing / Debugging

Reply

Sign up

Log in to the community

Scanning file for viruses.

This file cannot be downloaded