Purpose:
Test v3 of the db-extractor as easily as possible with as little local set-up as needed.
Outline:
This example will use the Docker version of the db-extractor, and also use Docker to make a temporary PostgreSQL database. If you use another version of the extractor (like a Windows service) and/or want to use a database already installed on your local computer, it should still be fairly simple to use this example and make your own changes where applicable.
Pre-requisites:
- Docker (can be avoided if you run a non-containerized extractor and have access to a database)
- A CDF-project with
- IdP-authentication
- Tenant
- Client
- Secret
- Capabilities connected to the client used in the extractor:
- Raw: List, Read, Write
- Extraction Pipelines Runs: Write
- One Extraction pipeline
- IdP-authentication
Steps:
- Create a containerized PostgreSQL database
- Open a new terminal
- Get docker image for the postgreSQL database (only needed once). Run:
docker pull postgres
- Create and run Postgres Container (only needed once). Run:
docker run -d --name psqlCont -p 5432:5432 -e POSTGRES_PASSWORD=pass123 postgres
- If container already exists and it is stopped, start it by running
docker start psqlCont
- Enter the interactive container and run psql as the user "postgres"
docker exec -it psqlCont psql -h localhost -U postgres
- Create a table on he default schema "public" (only needed first time container is created)
CREATE TABLE testdata(ID INT NOT NULL, NAME TEXT NOT NULL);
- Insert a value into the table. F.ex:
INSERT INTO testdata VALUES (7, ‘Son’);
- Run Extractor with config
- Open another terminal where you want to put your extraction configuration
- Get docker image for the Cognite db-extractor (only needed once)
docker pull cognite/db-extractor-base:3.2.2
- Create a config file "psql_config.yaml". A Template is shown at the end of this tutorial.
- Run the extractor
docker run --rm --network host -v ./psql_config.yaml:/config.yaml cognite/db-extractor-base:3.2.2 /config.yaml
- See data in your CDF-project
- The extractor is scheduled to run every minute
- Check that the row you created in the database is available in the Raw Table
- See the live extractor in extraction pipelines
- Check that the extraction pipeline runs gets updated every minute
- Test new rows of data
- Add new row of data (see step 1g), see it gets picked up by extractor
- Clean up
- To exit the psql database, use the command "exit"
- To exit the whole container, use the command "exit"
- The container is still running (check "docker ps") and can thus be re-entered if you want
- The db-extractor can be exited by CTRL + C
- You can verify that the extractor has turned off by checking the extraction pipelines
- The container can be stopped by "docker stop postgresCont". If you start it up again, you still have the data there
Template for “psql_config.yaml”:
logger:
console:
level: DEBUG
cognite:
project: <INSERT NAME OF CDF PROJECT>
host: <INSERT COGNITE HOST URL, f.ex https://greenfield.cognitedata.com>
idp-authentication:
client-id: <INSERT APPLICATION (CLIENT) ID>
secret: <INSERT APPLICATION SECRET>
scopes:
- <INSERT SCOPE, f.ex https://greenfield.cognitedata.com/.default>
tenant: <INSERT DIRECTORY (TENANT) ID>
extraction-pipeline:
external-id: <INSERT EXTERNAL ID OF EXTRACTION PIPELINE>
extractor:
mode: continuous
databases:
- name: psql-db
type: postgres
host: host.docker.internal
user: postgres
password: pass123
database: postgres
port: 5432
queries:
- name: my-psql-test
database: psql-db
primary-key: "id_{id}"
query: "SELECT * FROM public.testdata"
destination:
type: raw
database: Test
table: psql-test
schedule:
type: interval
expression: 1m