Skip to main content

How to run and test Cognite DB-extractor v3

  • 22 February 2024
  • 0 replies
  • 27 views

Purpose:

Test v3 of the db-extractor as easily as possible with as little local set-up as needed.

 

Outline:

This example will use the Docker version of the db-extractor, and also use Docker to make a temporary PostgreSQL database. If you use another version of the extractor (like a Windows service) and/or want to use a database already installed on your local computer, it should still be fairly simple to use this example and make your own changes where applicable.

 

Pre-requisites:

  • Docker (can be avoided if you run a non-containerized extractor and have access to a database)
  • A CDF-project with
    • IdP-authentication
      • Tenant
      • Client
      • Secret
    • Capabilities connected to the client used in the extractor:
      • Raw: List, Read, Write
      • Extraction Pipelines Runs: Write
    • One Extraction pipeline

 

Steps:

  1. Create a containerized PostgreSQL database
    1. Open a new terminal
    2. Get docker image for the postgreSQL database (only needed once). Run:
      1. docker pull postgres
    3. Create and run Postgres Container (only needed once). Run:
      1. docker run -d --name psqlCont -p 5432:5432 -e POSTGRES_PASSWORD=pass123 postgres
      2. If container already exists and it is stopped, start it by running 
        1. docker start psqlCont
    4. Enter the interactive container and run psql as the user "postgres"
      1. docker exec -it psqlCont psql -h localhost -U postgres
    5. Create a table on he default schema "public" (only needed first time container is created)
      1. CREATE TABLE testdata(ID INT NOT NULL, NAME TEXT NOT NULL);
    6. Insert a value into the table. F.ex:
      1. INSERT INTO testdata VALUES (7, ‘Son’);
  2. Run Extractor with config
    1. Open another terminal where you want to put your extraction configuration
    2. Get docker image for the Cognite db-extractor (only needed once)
      1. docker pull cognite/db-extractor-base:3.2.2
    3. Create a config file "psql_config.yaml". A Template is shown at the end of this tutorial.
    4. Run the extractor
      1. docker run --rm --network host -v ./psql_config.yaml:/config.yaml cognite/db-extractor-base:3.2.2 /config.yaml
  3. See data in your CDF-project
    1. The extractor is scheduled to run every minute
    2. Check that the row you created in the database is available in the Raw Table
  4. See the live extractor in extraction pipelines
    1. Check that the extraction pipeline runs gets updated every minute
  5. Test new rows of data
    1. Add new row of data (see step 1g), see it gets picked up by extractor
  6. Clean up
    1. To exit the psql database, use the command "exit"
    2. To exit the whole container, use the command "exit"
    3. The container is still running (check "docker ps") and can thus be re-entered if you want
    4. The db-extractor can be exited by CTRL + C
    5. You can verify that the extractor has turned off by checking the extraction pipelines
    6. The container can be stopped by "docker stop postgresCont". If you start it up again, you still have the data there

Template for “psql_config.yaml”:

logger:

  console:

    level: DEBUG

cognite:

  project: <INSERT NAME OF CDF PROJECT>

  host: <INSERT COGNITE HOST URL, f.ex https://greenfield.cognitedata.com>

  idp-authentication:

    client-id: <INSERT APPLICATION (CLIENT) ID>

    secret: <INSERT APPLICATION SECRET>

    scopes:

    - <INSERT SCOPE, f.ex https://greenfield.cognitedata.com/.default>

    tenant: <INSERT DIRECTORY (TENANT) ID>

  extraction-pipeline:

    external-id: <INSERT EXTERNAL ID OF EXTRACTION PIPELINE>

extractor:

  mode: continuous

databases:

- name: psql-db

  type: postgres

  host: host.docker.internal

  user: postgres

  password: pass123

  database: postgres

  port: 5432

 

queries:

- name: my-psql-test

  database: psql-db

  primary-key: "id_{id}"

  query: "SELECT * FROM public.testdata"

  destination:

    type: raw

    database: Test

    table: psql-test

  schedule:

    type: interval

    expression: 1m

Reply


Cookie Policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie Settings