Skip to main content

How to connect Cognite Data Fusion to Microsoft SQL [Cognite Official]


Arild  Eide
Seasoned Practitioner
Forum|alt.badge.img

This guide describes how to run the Cognite dB Extractor in a separate Docker container to fetch data from Microsoft SQL server to Cognite Data Fusion.

 

Prerequisites

  • Running instance of MS SQL server and valid credentials to SELECT rows from the table in question

  • Docker host that can access the MS SQL server 

  • Running CDF project with an Azure AD service principal with capabilities to write either to CDF.

 

1 - Prepare Docker Image

Cognite provides the database extractor as a Docker image published to Docker Hub, requiring just the addition of an ODBC driver. Since we are connecting to MS SQL we will install drivers provided by Microsoft for Debian 11, using this Dockerfile:

 

Note! Go to https://hub.docker.com/r/cognite/db-extractor-base/tags to see the latest version of the Docker image

FROM cognite/db-extractor-base:2.5.0-beta5

RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \
    curl https://packages.microsoft.com/config/debian/11/prod.list > /etc/apt/sources.list.d/mssql-release.list &&\
    apt-get update &&\
    ACCEPT_EULA=Y apt-get install -y msodbcsql18

Build the image: 

docker build -t cognite-db-mssql .

 

2 - Setup extractor config

The extractor will read a local configuration file that as a minimum defines:

  1. Details for authenticating towards CDF

  2. Database configurations

  3. SQL query

Authentication towards CDF

This section of the config file declares details required for the extractor to authenticate to be able to write to CDF. To create a new app registry in Azure AD please refer to the relevant section at  docs.cognite.com.

cognite:
  host: https://api.cognitedata.com
  project: cdf-project
  idp-authentication:
    client-id:
	client-secret:
    token-url: https://login.microsoftonline.com/[tenant]/oauth2/v2.0/token
    Scopes:
      - "https://api.cognitedata.com/.default"

 

cognite.project

Unique name of the CDF project - displayed in Cognite Data Fusion upper left corner 

iRxDJy79Lro99Zk453_eLFm5Hu5TvdehqDG9MzCRblCATC3MiPHPWE-40sru8WY76ZwjVpd7NnkoX0mWQCKAPtPjOviFZWqmp76UHxyl0Qnft2_rjVKx4Kcb6nz-7mBVlwz9NFekDYY-sZLITPnvqzQItzu9xOvOK6X9aVxdzSn0SnCG49Ul91QXpQ

cognite.host

The domain name where the api for resides. Example

https://api.cognitedata.com and https://westeurope-1.cognitedata.com 

Cognite.idp-authentication.client-id Cognite.idp-authentication.client-secret

Azure AD Application ID for a valid app registration and associated secret. 

Details available in docs.cognite.com

Cognite.idp-authentication.token-url

The token URL for the Azure AD tenant used, usually https://login.microsoftonline.com/[tenant]/oauth2/v2.0/token

cognite.Scopes

OIDC Scope required to access the CDF API.

<cognite.host>/.default, for example https://api.cognitedata.com/.default

 

 

Database connections

This section declares one or more databases the extractor will connect to. Since the configuration supports more than one, it is a yaml list.

databases:
  - name: mssql
    connection-string: "driver={ODBC Driver 18 for SQL 
    Server};SERVER=192.168.1.14;DATABASE=testdb;PORT=1433;UID=sa;PWD=**"

name

Local label used to reference database connections in the query (next section)

connection-string

Connection string declaring which driver to use and required details to connect. Each key-value is separated by ;

 

SQL query 

The queries section declares the SQL query the extractor will run, against which one of the databases declared in the previous section and the CDF destination the result set will be written to.

queries:
  - name: test-query
    database: mssql
    query: >
      SELECT
        externalId,
        timestamp,
        value
      FROM
        testdata
      ORDER BY
        timestamp ASC
    destination-type: TIME_SERIES

 

queries.name

Freely selected name of the query, used by the extractor when logging 

queries.database

The database connection to be used - the name of one of the connections declared in previous section

queries.query

SQL query to be executed - result set column names to be exactly externalId, timestamp and value when writing to CDF time series. Use sql alias as source columns does typically not have these names.

queries.destination-type

The destination within CDF the extractor will write the result set to. 

 

3 - Running the extractor

docker run -v /config.yaml:/config.yaml cognite-db-mssql /config.yaml

The container log will look something like this:

2022-09-28 08:56:39.232 UTC [INFO] MainThread - Loaded local config file
2022-09-28 08:56:39.406 UTC [INFO] MainThread - No extraction pipeline configured
2022-09-28 08:56:39.406 UTC [INFO] MainThread - Cognite DB Extractor version 2.5.0
2022-09-28 08:56:39.407 UTC [INFO] MainThread - Starting extraction of 1 queries
2022-09-28 08:56:39.424 UTC [INFO] QueryExecutor_0 - Connected to mssql
2022-09-28 08:56:39.424 UTC [INFO] QueryExecutor_0 - Starting extraction of test-query
2022-09-28 08:56:39.456 UTC [INFO] QueryExecutor_0 - No more rows for test-query. 3 rows extracted in 0.0315 seconds
2022-09-28 08:56:39.930 UTC [INFO] QueryExecutor_0 - Uploaded 3 datapoints
2022-09-28 08:56:39.930 UTC [INFO] QueryExecutor_0 - Closing ODBC connection to mssql 

The extractor has now created one or more time series and uploaded data points to CDF. 

 

4 - Complete example config

version: 2
cognite:
  host: https://greenfield.cognitedata.com
  project: **
  idp-authentication:
    client-id: "***"
    secret: "***"
    token-url: "https://login.microsoftonline.com/***/oauth2/v2.0/token"
    scopes:
      - "https://greenfield.cognitedata.com/.default"


logger:
  console:
    level: INFO

databases:
  - name: mssql
    connection-string: "driver={ODBC Driver 18 for SQL Server};SERVER=**;DATABASE=**;PORT=1433;UID=;PWD=**"


queries:
  - name: test-query
    database: mssql
    query: >
      SELECT
        externalId,
        timestamp,
        value
      FROM
        testdata
      ORDER BY
        timestamp ASC
    destination-type: TIME_SERIES

 

5 - Next steps

For recurring loads of data, the Docker container can be started from Crontab. It is recommended to adapt the query to do incremental loads and setup up monitoring of the Extractor using Extraction Pipelines in CDF. 

Reply


Cookie Policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie Settings