APIs and Python SDK for SQL Transformations

  • 11 November 2021
  • 0 replies
  • 95 views

Userlevel 3
Badge

Hey there! I am Sunil, Product Manager for Data Integrations. Since this is our first post in the Hub, let me introduce my team and what we do. My team is responsible for the Ctrl+C Ctrl+V of your data from source systems to Cognite Data Fusion :wink: . But on a serious note, we are responsible for ELT - Extract, Load, and Transform of data. We are thrilled that you are interested in learning about the Transformations Python SDK. Yayy!😀

In this post, I list a couple of use-cases/scenarios where the python SDK for transformations makes managing and orchestrating transformation jobs simple and seamless. Our goal is to make the transformations SDK more developer-friendly and would like to hear your feedback.

Do check out the API and SDK docs for more details, but now on to a few use cases!

 

Use Case 1: Triggering Transformations

Cognite Data Fusion helps liberate industrial data silos and provides a unified and contextualized view of the data. It is now easy to derive insights and build applications.

To extract data from source systems, extractors are developed using Cognite python SDK. These extractors upload data to the staging area, i.e., RAW. This data is then processed using SQL transformations to make it fit for consumption by machine learning models, dashboards, and applications.

Let’s look at how SQL transformations were triggered before and what’s possible now.

Before:

After uploading the data to RAW by an extractor, event-based triggering of transformations for further processing was not possible. The transformations were scheduled to run at frequent intervals, thereby creating a delay in making data available for consumption.

 

Now:

Once the data is uploaded to RAW by an extractor, a transformation can be triggered for further data processing. This is a handy time-saving feature that makes the data available for consumption immediately.


Sequence Diagram:

 

Use case 2: Sequential Orchestration of Transformations

Transformation jobs may often be complex and time-consuming. It’s advised to keep the jobs modular, making them easy to maintain and debug. To orchestrate transformation jobs in Cognite Data Fusion, a user needed to allow for significant time intervals between scheduled jobs.

Using the Cognite Python SDK for transformations, the client will wait until a transformation job is complete before triggering the following transformation, enabling users to perform simple orchestration.

 

Sequence Diagram:

In the following article, we will be sharing a workbook for you to try things out.

What are the other use cases where you think the Transformations SDK may be helpful? Do share your ideas and comments below, the team is eager to help solve more use cases together!


0 replies

Be the first to reply!

Reply