Skip to main content

 

I have created 1 workflow , in which I am creating dynamic tasks depending on input, it creates batch of ids and create tasks out of it. Below is workflow definition

WorkflowVersionUpsert(

   workflow_external_id="test_dynamic-0729",

   version="1",

   workflow_definition=WorkflowDefinitionUpsert(

       description="This workflow has two steps",

       tasks=

           WorkflowTask(

                external_id="test_sub_tasks",

                parameters=FunctionTaskParameters(

                    external_id="test_sub_tasks",

                    data="${workflow.input}"

                ),

                retries=1,

                timeout=3600,

                depends_on==],

                on_failure = "abortWorkflow",

            ),

           WorkflowTask(

                external_id="test_create_sub",

                parameters=DynamicTaskParameters(

                    tasks="${test_sub_tasks.output.response.tasks}"

                ),

                name="Dynamic Task",

                description="Executes a list of workflow tasks for subscription creation",

                retries=0,

                timeout=3600,

                depends_on=t"test_sub_tasks"],

                on_failure = "abortWorkflow",

           )

       ]

   )

 As part of this workflow, I have some task that are needed to be executed in parallel and are expected to finish in around similar time if running parallel. However, dynamic tasks are not getting executed in parallel, I have 6 dynamic tasks, but they are finishing up sequentially. Sharing snapshot. If we can see time difference between first and last task is ~6m. Not able to take advantage of parallel execution with dynamic tasks.

Impact : Dynamic Execution taking sequential time to process tasks, further delaying completion of workflow

Hi @Rimmi Anand

When you say you have “6 dynamic tasks”, are you then referring to 6 Function Tasks within the Dynamic Task of the workflow definition you shared? And the problem being that these 6 Function Tasks are executed sequentially rather than in parallel?

Can you please share the workflow execution details (you can use this endpoint to retrieve it) of a run where you observe sequential execution when expecting parallel? Note that the execution details might contain sensitive information - if that’s the case, remove it before sharing (here or by DM).


I am referring to Function Tasks within dynamic tasks. They start in parallel but completion time is increased with each dynamic task. In Parallel execution, all task should finish up approx equal time. Let me know if that is not the case . Sharing execution details in txt , it might have 3 dynamic tasks, feel free to convert in JSON as it didn’t allowed to attach. If any other information is required. Let me know.


Thanks. We’ll take a look. The behavior you see could be explained by the task concurrency limitations explained in this thread: 

 


Hi @Rimmi Anand. We’ve investigated your issue and the root cause is a problem we’ve been experiencing with scalability/concurrent executions of Cognite Functions on Azure (Data Workflows functions as expected in this case). We’ll have to follow up on this next week. How critical is this specific issue for your use case?


  1. We have multiple workflows and some are dependent workflows. Long execution time is impacting performance of our workflow and execution time is deciding factor for schedule of workflows. So its critical for our workflows.

Hi @Jørgen Lund , is there any update on this, still facing issue , 

I didnt have any other function running while testing for this, 


Related issue with additonal commnets here. We’ll take another look at this.

@Rimmi Anand could you share the following for this particular case:

  • Which CDF-project is this and what cluster is it running on? 
  • What is the function-ID for the function in the pictures above?

Hi @Jørgen Lund 

Sharing details. I don’t have functionID for this screenshot as this was redeployed and new functionID was created. I can share externalID for this function if that helps.

  • cluster: westeurope-1, project: slb-uds-dev
  •   extenalID - wk_ingest_osdu_3

Thanks @Rimmi Anand. We have looked into this, and unfortunately we believe this is the result of an issue on the side of the cloud provider (Azure). We’ll have to continue working towards a resolution, but due to the nature of the issue we expect this to take some time. 

This might sound like a strange suggestion considering the issue, but would it be possible for you to try to parallelize the work even more? E.g. instead of making 5 calls i parllel, make 10 or more? Or is this not possible in your case due to limit on number of tasks in the data workflow (we might consider increasing this)? This might help work around the issue in Azure and prompt the Function to scale out as expected. 


we have tried by making more calls. 

function ID - 7220560071032348

cluster: westeurope-1

project: slb-uds-qa

 


A bit hard to tell from the screenshot as it’s not clear exactly when each call wass made, but do you see an improvement to parallel execution? 

We’ll also take a look at our logs.


We are working on this issue and have discovered that there are problems when “cold” functions are being called and they will not scale in the expected way. This looks to be a problem with Azure Functions which are being used to run Cognite Functions on Azure, and we are in dialog with Microsoft to understand why the problem occur, and how this can be fixed.

A mitigation that can be used until we find a more permanent solution is to warm up the function with a single call (e.g. use a cheap computation or have a separate parameter to tell the function not to do much work) as a separate step earlier in the workflow. This needs to be done before the real computations, around 30-40 seconds before to not trigger the bad scaling behavior for the real computation that you want to do later in the workflow. When the function is warm, our internal testing has shown that the function will scale out as expected.


Hi @Dag Brattli , can you please provide the example? 


We have received confirmation from Microsoft that this is a known issue affecting Azure Functions running on the Consumption Plan. Microsoft has scheduled a fix for mid-January.

In the interim, we recommend implementing one of these mitigation strategies:

  1. Maintain the function warm by implementing a lightweight health check call every 1-5 minutes. The function will become cold after approximately 30 minutes.
  2. Pre-warm the function by triggering it approximately a few minutes before your scheduled processing needs using a lightweight call. The function needs to be idle for about a minute between the warmup and the actual call in order to scale correctly.

These workarounds will help ensure consistent performance until Microsoft's planned update resolves the underlying issue. We will continue to monitor the situation and update you once the fix has been deployed.


Reply