How to use extraction pipeline configuration to configure Cognite functions [Cognite Official]

  • 17 February 2023
  • 4 replies
  • 282 views

Userlevel 1

Introduction

Co-author: @Jan Inge Bergseth 

Cognite Functions provide a run-time environment for hosting and running Python code, similar to Azure Functions, Google Cloud Functions, or Amazon Lambda. One of the benefits of utilizing the built in Functions capability of CDF is that it is tightly coupled with CDF and gives you, as a developer, an implicit Cognite client allowing you to seamlessly interact with your CDF data and data pipeline.

 

CDF Extraction Pipelines allow you to monitor the data flow and gain visibility into the run history and receive notifications from data integration events that need to be paid attention to, all of which is an important part of the resulting data quality. CDF Extraction Pipelines offer a great feature for storing extractor configuration settings, allowing you to remotely configure the extractor. This can be extremely helpful as it avoids having to locally manage extractor configuration.

 

This article will explore the combination of Extraction Pipelines for flexible configuration and visibility in combination with Cognite Functions. We will use the extraction pipeline configuration to dynamically configure the Cognite function without having to redeploy the function itself. 

Define a data set

Before we create our Extraction Pipeline we need to create a data set that our extraction pipeline can be associated with. Within the extraction pipeline we will also provide the configuration that our Cognite Function also will be configured.

 

Select Manage Data Catalog from the Manage menu:

 u_pYWBPOK7lKB11ayvPMOiahghPfWMl3Plb6X579f156S3S4dEGil1K2yZUy5eBNDkPFVZr3Qb7f3teY0Wwcf5Iyyxw56D7ijNRih-JqO-ehO6PoIVhGdMIY7eabb8BWl-UnUgSqcbDNpyGivEJSTzg

 

Create a new data set

 

e46ZE3167NzALRRqnbVSn2ZYtyIHZnYt97apPftlbqdivGMBC2zqbVnpwBAeKAf_hgw0GxXKl6Upn5wOQ_8nRllL3zXuwoE7VPw05zy_uW9QLr_kIGAGZHH6z5QvuXwrp0TnXiTeTSe0wbAWr2p6Mkg

 

Name the data set: “Function pipeline configuration”.

Setting up the extraction pipeline

Access rights required

E0NmQsazPVfFu43qGMa6qZEBInKqx2LYGQ0CuTmTnAdIqFA9UGMqMKU67Vp65-Uv4gVLPT1wYP-W6kFBkBUldDSaRZC3ACapU72jeef98aQPJ1jiLyc_3bOtwwwOh1TcZhoJD-11yomUNZ4YcvqcJc0

 

Create the extraction pipeline

From the Integrate-menu select Create and Monitor Extraction Pipelines from the Integrate menu.

j0EonwVVkF0zVgbS0NC42znzk8ugsKvwpjrI4bIfxDITz5DtDLcWyuJMXmKwza-5ep-dEYikoP3sPoORc0GdwEqq8awz_HYYFDdg7s6QxjsL6ORPZLkqtDicaFpZzW7ZZHqYuRScj9k3lafSBTtALBY


In the upper right corner select Create extraction pipeline and fill in the required details. In our example we have created the Function pipeline configuration dataset

 

VeY9PIoEnzHqSce6eQmcz4zx4YTysy2uX2bDhm7gXtIQDpeLfzCeIVab_YbvOiRdP-dCU3-1r1wC5TB6EvcK7Ioyp1o0pI7hKrkFu7EsTjMW3MVXcWCnQaHpmv8llouI7wnPq0PjzJg5BSJnT4cXTVk


 

fUQ9HoEvg-hTzLdcoc5l_5Ywv9KVmIB2c1werEQdk2Sx-fMUrFyw5A1uh6PlVtNRSG2bdri8aDL-6eVpSx0X24AuXmk49_FbnKi1JD49X35LViVKoJMBEgR_CmglTu492f8hyX7sdQcmsVS144Q2sPA

Document the function in the extraction pipeline

Easily accessible documentation. Add documentation to the Extraction pipeline, connect the notification service to an e-mail address for ease of understanding and easy access to documentation close to the code and notifications when the process fails.

 

2Qg2_i3ZLSaCSWVMcsAmFZqKmgUaVw1zVGRHG_S6m5YVwC-SnbfsBy_mWHxp0KCY-ioMHq6yFiiyhxyGMfqnbiWxebTtLBFVPENteP576wCbG1MdnPnw-MpjiH7T39X43AawO9InNNJFnZUKEre0AI8

 

Define the configuration for your function

Choose to edit the documentation and add the text below and publish the configuration. 

Please note that the data-block and below text needs to be copied as is with indents and all.

 

Example #1

data:

    ts_external_id: cdf_hub_sine_pipeline_config_1

    ts_name: sine_pipeline_config_1

    amplitude: 0.5

    period: 5

 

Setting up the Cognite Function

In this example we’ll create a Cognite Function that emits a simple Sine curve into a time series in CDF. As the function runs, we’d like to adjust the amplitude and number of periods of the curve. Our function should also take the time series external_id and name as input from the configuration. 

 

The function will write time series data from the time you run the function and forward in time. The code for our test function is also available in the GitHub repository: https://github.com/cognitedata/how-to-cognite-function-extr-pipeline-config

 

Deploy your Function to Cognite Data Fusion

 

As described in the article: https://docs.cognite.com/cdf/functions/ there are multiple ways of deploying your function to CDF.  In this example we will deploy by uploading of a zip file to CDF.

 

You will find the files handler.py and requirements.txt. Handler.py contains the code for the function, and requirements.txt list python packages used in the function that are not included by default in Cognite Functions. 

 

Select  handler.py and requirements.txt. And create a zip file that you can name anything ( in example we named it sine.zip)

 

iv-NNbw1Gq1g-NPeo8dxHUXp9E3SBQEigwiarbY4hc5PFaNFdbwYCyrxFOnSIsFuplRj8Tya1Gg9mhf43dNviNNY90ZjGsy757lC97d0fQE9NLA87LNPQM0VqO2YR7ZqqyuK77ow7T-2KGsbM0OCpag

 

Then from the Cognite Data Fusion UI, select Explore -> Cognite Functions:

 

hVgHUhJ5AjL7cts6hNENeYpiNUyqB5jB_60FPtkSV_EpWz8pbQlegKGMQzpI_oROEnlBC93KpFLjewPsSskPfVCjHJDRHmjdEtGkbirFhz4y9F0oulURlz_Hg-8J0Ve_rUGznvXc7sv37sK-RDadOIk

 

In Functions, click on: txWQZZDfAgiB_Tc_1HQl8pVz3VdZuxiKyVSvoG2_aTOn1Hm6a7Pz9N1pujoNrZgLgEd3aQOZTIDMPS5_RyljPr6eqIK_QwWfJ4gMGBGdfkUG9o3hsoV-WSnoIweY0h5EjSkP3Pl4d4hsQ2n0CRhrHSc

 

In the Upload function forme you only have to:

  1. Drag and drop in your zip file ( with handler.py and requirements.txt) into the Function file box.

  2. Give your function a name

  3. Give your function and External Id (This is optional but good practice to always do)

  4. For rest of the configuration you can just leave it blank or use the provided default values

  5. Click on BhJMXpKM3mz3FnqZum73lQ_WYjLH0IUVj0C3Ck95eRoL8Ei_JXrdZvxymlm4hnSpUomP-3LzbIDKTkerhjeDQtJFWOLn5am4MogdQxJRjvT9hTCtEaftoldaJz386pn7k2nViuK1EFs-5vd6l2x3laY


 

rTDn3TTOT8T9EK6MuJmBSXetqC2JSYTvSedPUZOQKn8xoU28qJ11vbYPi0Xn0iYvpptZKrX2R0WpZSOnLHjWaJg4nY6AZyui0duIaOkgb4PeYx2YEunHfirsHKS93dpBa2sernExAsmKQTiRks5hkUw

 

Once uploaded the function will show in the UI as Queued. From Queued the function will move to Deploying to Ready 

 

Wk7qxmYZU75ep6J_AwGaFoLC8c9iwmindcdFGzu73dGLDeibzu8NaGRJN4Hia2qjgIhc0pcNY3d2kQGcJOYNJME103KmH-f8YdZ3nZG98tkigmZWNE-MjJnbdJkY2CuZEdVuk-VrtYU-lxwZV93xAt4


 

Test running your function

Once ready, we can run the function by clicking on the arrow : kAEXtYD3fqP68-EWjEBEuWzsjWPklQpJ5N9BdMafd6uxCQDg6F0aB7TEXUI7AORfmPbknJ3kQZH55JCC0WrffwuZn6ploT_7PtJVdAM-Tr6Jgu4FlFcV7HrhkJmpES5qScQ2Byy-oqJxCHX0Z5cUhWk 


 

As input data use:

 

{

"ExtractionPipelineExtId": "sine-function"

}

 

Value here is the same as you used as external ID when you configured your extraction-pipline in the Cognite Fusion UI.

 

Then click on vmOfsJQfndJDXI7YPOw0tWx1khsMP7FFJzNIOeA25lzsJDc58yvLMoG3odIl24tlCYhHtA25_JoRsUPMbmQJTy9yJeq8QEcY3Myla4aVAzKDsEow18cuE5hHK3XU8Vk_3ZQtwSuE0PW6VcgdPiwyucM to run the function.  One completed you should see:

 

XAULHYysUcu0xR9Yx6wh6sNbrdZbvVI7F63tgfQZhjmbo0lFaNYZGsTrpYI3DTmb2TKFXAshY7EOJDnVn5hSwOAnO_t-MUVj-HfEMmAyyVo_UApq29EA5qQBgynS5BnyRdJM-0my_F0ubrVeoH28X5Q

 

You now have the possibility to explore the logs created by the function, expand the sine_function details, and click on view logs

 

FxKEvcVCHVAiKFkqNivluQmtIXg3XzNFDChBsltZH4Xh0MmkXHMikKAq8n2uvEV5f3aazT_-G8Ya7TwYmt9aapnKm3ip1QsvsVJFEdKRRmfQG8G36rs6XNfB3XWrrJ0KIOvKtIPmc0kZo8dqFmIzNhI

The log should look something like this:

 

VtHjgvvZWL6PYDb1hHL42UwIwV95V4bqjy_0ETLDYONX52W-9TLbd7OX1g6vueTbCYx-TFwSSlz4PMF0LI0cXG5RaCElxXHTuHi1-6mSgP5g85MZ7g2_yLUnfJLtXPIR4yHXhSas729fzq8JYyaZlIg

 

Next step would now be to see the effects of updating the configuration for the function. The configuration was part of the extraction pipeline.

 

Update your configuration in the Fusion UI ( Extraction Pipeline - Sine Function)

Example #2

data:

    ts_external_id: cdf_hub_sine_pipeline_config_2

    ts_name: sine_pipeline_config_2

    amplitude: 2

    period: 7


 

Re-run the function with the new configuration, but the same function input data as before (pointing to the extraction pipeline external_id). Check the new output. This will create a new time series. When looking at the time series data points, please note the changes made in the configuration reflected in the new time series.

 

Creating a failed run

Next we will create a failed run, just to see how this looks in the Extraction Pipeline run history. Update the configuration in the extraction pipeline as follows:

Example #3

data:

    ts_external_id:    

    ts_name:    

    amplitude: 2

    period: 7

Check the extraction pipeline run history

Go back to the extraction pipeline you created at the beginning and select Run History in the upper right corner. 


 

DO82Wmhdx4j0Du3qUBqk4uXzyOKI0DjGcQU0Fb_YgEXwSH80Se0dcXrYoxQyt0PMwXzskJZeDMOxOJNtIIsX3yDbxMJALXSpSMISWmr3dBa1AK2zmz7L9UOjlnj0eMtNpA6aCqzjgr7te9KqEmaRXa8

 

This will give you an overview of the runs the function has logged in the extraction pipeline. 

 

7JNVt5rg2-cclXYAsZp8OddvmJgpP3wahEl9Two1tPlzuPfNP38BiU8HXlPZPj3pNt5Mo3jL8i15vIFMEGJatj2wjmjZ8vJUwkEpdWgu61QvG12Lp-EmCz-e7hOMLF45JWkhiFHYD6sjLBu7ldeDw2E

 

As you will note there are some successful runs and one failed run. 

 

The failed run should also have triggered an email notification to the email address defined as the owner of the extraction pipeline.

 

For each logged run there is also an overview of which configuration revision being used for that particular run. By clicking the revision number you will be able to see the actual configuration settings the function has used.

hDNfQR2XRFTmYYAgLWfN4KWj5mE5-3KQi6Vd29lqR6JKStoSmC-Kf5u8ALIpS5INfaEyuJlL4W1RPN69GXrgdhK38MbQT64f_E743okcnjBG0AI4U0mHPsPT5l1SkPn_7N9HzbS1nmRlVR68flWCPA4

 

View your result in Cognite Charts

 

Cognite provides a tool called Charts that we will use to view the generated time serieses from our function.  Charts is a powerful and speedy tool for engineers and domain experts to explore, trend, and analyze industrial data. It gives industrial experts instant access to data and no-code tools to find, understand and troubleshoot for actionable insights.

 

The description below on Chart usage will be very limited, for more information see: https://docs.cognite.com/cdf/charts/


 

Open Charts in the Fusion UI:WHAG-tZVPlIjojUVAk7vXvyCTxZhZ_lvkPd2U4QotY9DPGhKQheHJzKa7jxLxtPMBfetxK_JKT2R22spUQ_Ol1iqX4QbnFNpd4ycDqIm1eaptE2cGA63IbeSUm9nkM42rJsdkv1ROLXJVWxOtKSAxWs

 

 

Click on the button:   Ktz3sorRPPPfUu10DCdZHCt2oHpk7nIqDA-79y3aA13a9BsmdpCbatzxdBExooRN0wlmofbFtGK1Gr57HfiORwKSYaj4UsEsNR9UCrMTcuToP_uGBvHrlL_gPp3rlYYpmzE3nNZPoltqk5bnsLGaBso

 

(give it a name if you want to, as Sine Function) and find the function generated Sine time series by clicking on Time series. Search for the names you used on your time serieses, and select them in the list of found time serieses

jWup1E6Gg3GRovc-0Kitk39KvEm7_3iWjL6dtKt98IZe8AKx0AhrOc8q_4SRAYYy9p_4vH1UJgUbNTSC9hXnl8hSuXqWMcaNIB2Jn_mMaT1p1gu5-CVdJ7GfMRE5TrgwUBMcOfRkTkAf_Kd2ILkWFg8




 

You will now get the plotter sine curves in the application.  You might have to adjust the Y axis - after the adjustment you should be able to see 2 different curves corresponding to the names, amplitude and number of periods you configured for the function in the extraction pipeline.

 

1oP07N9Eg7rrXPHovGRyKKYuL4im9zehWGs-l9lEzC9e-DGKeq-ik3NYZqU4ODt3GGEejHrHLNoN9EDdQeOv1OUBIkKrgPb-zZ2pUtoFEkJMs2yxan0oEQ5qjXgadpPAzu5DNzJuEtmowwBaqtJIJhk

 

Conclusion

In this article we have covered how we can define a Cognite function that uses the Extraction Pipeline configuration feature to dynamically update configuration settings without having to redeploy the function. We have also looked at defining a data set, Cognite function deployment and logs.

References:

Demo code repo: https://github.com/cognitedata/how-to-cognite-function-extr-pipeline-config 





 


4 replies

Userlevel 3
Badge

Interesting article @Frank Danielsen and @Jan Inge Bergseth 

Userlevel 3
Badge +5

Hi @Frank Danielsen and @Jan Inge Bergseth !
 

I thoroughly enjoyed the article; we've implemented some of those steps at the Cognite boot camp. However, I'd like to clarify whether deploying an extractor as a Cognite function in a high-volume scenario is considered a best practice, given the limited resources of the cluster. Could you confirm if my understanding is correct? If not, I'm eager to learn more about the recommended best practices for running ETL. Would utilizing solutions like Azure Data Factory be preferable, or is it more advisable to run the Cognite extractors on a dedicated cluster within our client's cloud provider or their local infrastructure?
 

Looking forward to your insights.

Userlevel 1

Hi @Andre Alves

We appreciate the feedback. Great to hear you made use of it.

While we see patterns of running extractors as Cognite functions, it is generally not considered best practice due to the limitations you mention. In particular in a high-volume scenario I would caution this approach. Having said that, the ease of Cognite function deployment makes it very attractive in cases where setting up other infrastructure takes time, or has other limitations. In those cases you should consider the volume of data needed to extract and how that fits within the limitations. E.g. long running tasks are generally a no-go. Examples that I have seen are REST API extractors running in Cognite functions, but this is for low volume, low call frequency.

Regarding ADF or extractors on dedicated cluster, it really depends on the customer infrastructure. If you need to extract from source systems not naturally exposed to the internet/web, running the extractor in the customer’s DMZ is often the preferred way as this does not necessitate opening inbound ports in the firewall. Hope this helps.

 

Userlevel 3
Badge +5

Thank you, @Frank Danielsen , for the prompt response on this matter. Your explanation makes complete sense to me, and I will certainly consider these trade-offs when building and deploying extractors for our clients' use cases.

This highlights the significance of Cognite Hub; we continue to learn more about Cognite every day with your team.

Reply