How to connect Cognite Data Fusion to Postgres [Cognite Official]

2 years ago
December 1, 2022
0 replies
291 views

Pedro Lindeza
Practitioner
2 replies

This guide describes how to run the Cognite dB Extractor in a separate Docker container to fetch data from Postgres server to Cognite Data Fusion.

Prerequisites

Running instance of PostgreSQL server and valid credentials to SELECT rows from the table in question
Docker host that can access the PostgreSQL server
Running CDF project with an Azure AD service principal with capabilities to write either to CDF.

1 - Prepare Docker Image

Cognite provides the database extractor as a Docker image published to Docker Hub, requiring just the addition of an ODBC driver. Since we are connecting to PostgreSQL we will install drivers provided by PostgreSQL for Debian 11, using this Dockerfile:

Note! Go to https://hub.docker.com/r/cognite/db-extractor-base/tags to see the latest version of the Docker image

FROM cognite/db-extractor-base:2.5.0-beta5

RUN apt-get install odbc-postgresql

Build the image:

docker build -t cognite-db-postgres .

2 - Setup extractor config

The extractor will read a local configuration file that as a minimum defines named config.yml:

Details for authenticating towards CDF
Database configurations
SQL query

Authentication towards CDF

This section of the config file declares details required for the extractor to authenticate to be able to write to CDF. To create a new app registry in Azure AD please refer to the relevant section at docs.cognite.com.

cognite:
  host: https://api.cognitedata.com
  project: lervik-industries
  idp-authentication:
    client-id:
	client-secret:
    token-url: https://login.microsoftonline.com/[tenant]/oauth2/v2.0/token
    Scopes:
      - "https://api.cognitedata.com/.default"

cognite.project	Unique name of the CDF project - displayed in Cognite Data Fusion upper left corner
cognite.host	The domain name where the api for resides. Example https://api.cognitedata.com and https://westeurope-1.cognitedata.com
Cognite.idp-authentication.client-id Cognite.idp-authentication.client-secret	Azure AD Application ID for a valid app registration and associated secret. Details available in docs.cognite.com.
Cognite.idp-authentication.token-url	The token URL for the Azure AD tenant used, usually https://login.microsoftonline.com/[tenant]/oauth2/v2.0/token
cognite.Scopes	OIDC Scope required to access the CDF API. <cognite.host>/.default, for example https://api.cognitedata.com/.default

Database connections

This section declares one or more databases the extractor will connect to. Since the configuration supports more than one, it is a yaml list.

databases:

- name: postgres

  connection-string: "driver={PostgreSQL UNICODE};SERVER=192.168.1.14;DATABASE=testdb;PORT=5432;UID=sa;PWD=**"

name	Local label used to reference database connections in the query (next section)
connection-string	Connection string declaring which driver to use and required details to connect. Each key-value is separated by ;

SQL query

The queries section declares the SQL query the extractor will run, against which one of the databases declared in the previous section and the CDF destination the result set will be written to.

queries:
  -
    name: phonebooks
    database: postgres
    query: >
      SELECT * FROM phonebook;
    destination:
      database: "CDF_RAW_DB"
      table: "CDF_RAW_TABLE"
    primary-key: "phone"

queries.name	Freely selected name of the query, used by the extractor when logging
queries.database	The database connection to be used - the name of one of the connections declared in previous section
queries.query	SQL query to be executed - result set column names to be exactly like the source table when writing to CDF Raw. Use sql alias as source columns does typically not have these names.
queries.destination-type	The destination within CDF the extractor will write the result set to.

3 - Running the extractor

First we create a docker compose file named docker-compose.yml that will terminate the process once the extraction is completed. (There are other possible approaches, e.g. link):

version: '3'
services:
 db-extractor:
  image: cognite-db-postgres
  volumes:
    - ./config.yaml:/config.yaml
  network_mode: host
  command: /config.yaml

Then, to run this file we simply do:

docker-compose up

The container log will look something like this:

db-extractor-postgres-db-extractor-1  | 2022-12-04 09:57:14.946 UTC [INFO    ] MainThread - Starting extraction of 1 queries
db-extractor-postgres-db-extractor-1  | 2022-12-04 09:57:15.101 UTC [WARNING ] QueryExecutor_0 - Could not set timeout on the ODBC driver - timeouts might not work properly.
db-extractor-postgres-db-extractor-1  | 2022-12-04 09:57:15.101 UTC [INFO    ] QueryExecutor_0 - Connected to postgres successfully
db-extractor-postgres-db-extractor-1  | 2022-12-04 09:57:15.102 UTC [INFO    ] QueryExecutor_0 - Starting extraction of phonebooks
db-extractor-postgres-db-extractor-1  | 2022-12-04 09:57:15.181 UTC [INFO    ] QueryExecutor_0 - No more rows for phonebooks. 3 rows extracted in 0.0789 seconds
db-extractor-postgres-db-extractor-1  | 2022-12-04 09:57:15.303 UTC [INFO    ] QueryExecutor_0 - Uploaded 3 raw rows
db-extractor-postgres-db-extractor-1  | 2022-12-04 09:57:15.303 UTC [INFO    ] QueryExecutor_0 - Closing ODBC connection to postgres for phonebooks
db-extractor-postgres-db-extractor-1 exited with code 0

The extractor has now uploaded raw data to CDF.

4 - Complete example `config.yml`

# Read all of these from environment variables
cognite:
  host: https://bluefield.cognitedata.com
  project: pedro
  idp-authentication:
    client-id: "**"
    secret: "**"
    token-url: https://login.microsoftonline.com/**/oauth2/v2.0/token
    scopes:
     - https://bluefield.cognitedata.com/.default

# Type of configuration
type: local

# Logger Settings
logger:
  console:
    level: INFO

# State storage
extractor:
  state-store:
    raw:
      database: "goals"
      table: "phonebook"
      upload-interval: 30

# List of databases
databases:
  -
    name: postgres
    connection-string: "Driver={PostgreSQL UNICODE};Server=127.0.0.1;Port=5432;Database=postgres;Uid=postgres;Pwd=mysecretpassword;"

# List of queries
queries:
  -
    name: phonebooks
    database: postgres
    query: >
      SELECT * FROM phonebook;
    destination:
      database: "goals"
      table: "phonebook"
    primary-key: "phone"

Be the first to reply!

Reply

Rich Text Editor, editor1

Cookie Policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Prerequisites

1 - Prepare Docker Image

2 - Setup extractor config

Authentication towards CDF

Database connections

SQL query

3 - Running the extractor

4 - Complete example config.yml

Reply

Related topics

Como resolver o erro 843 na TV?icon

TV com mensagem de instabilidade na ligação.icon

Como resolver o erro ID15 na App NOS TV?icon

Canais TV sem Box

NOS TV - O que precisa saber

Sign up

Log in to the community

Scanning file for viruses.

This file cannot be downloaded

Cookie Policy

Cookie settings

4 - Complete example `config.yml`