Hi @Pablo Palacios,
Thank you for bringing this to our attention. The engineering team has been informed, and I will update you as soon as I have an update on this.
Hi @Pablo Palacios,
The issue has been resolved in the latest version, 3.4.7. Please check and let me know if you encounter any problems.
Thank you very much, the problem has been solved
The extraction from an Excel sheet works fine for me, now I have the problem that I want to extract data from different Excel sheets, the data is sent to CDF in RAW in different tables, but only both tables are overwritten by the second query, what am I doing wrong?
databases:
-
type: spreadsheet
name: "DBexcel"
path: "C:\\Users\\ppalacios\\Documents\\Proyectos DPA\\RA Software\\FT DataMosaix\\SQM SX5\\SAT\\Protocolo SAT.xlsx"
queries:
- name: query1
database: "DBexcel"
sheet: "VDF"
query: >
SELECT * FROM VDF
destination:
type: raw
database: "SX5"
table: "SAT_VDF"
primary-key: "{row_number}"
- name: query2
database: "DBexcel"
sheet: "MTR"
query: >
SELECT * FROM MTR
destination:
type: raw
database: "SX5"
table: "SAT_MTR"
primary-key: "{row_number}"
As additional information, the extractor always sends me the first tab of the excel to the RAW and writes it in all the tables of the RAW
hi @Dilini Fernando:
Could you see with the equipment what I mentioned above? Although the Excel extractor works now, it is not possible to extract information from other Excel sheets, only from the first one. In the extractor I defined it according to the documentation, but it does not work.
Hi and thanks for the report @Pablo Palacios!
Looks like we had a bug in the spreadsheet provider of the DB Extractor which I've attempted to fix. There will be a v3.4.8 release coming that should _actually_ respect the configured sheet name to extract from and not default to the first sheet in the list.
Thanks @Thomas Sjølshagen
version 3.4.8 has already been released. I tried it and the other Excel sheets still cannot be read. Will another version come out with this improvement?
Hi Pablo,
I tested myself and I can confirm that the extractor is working fine when doing multiple queries ponting to different sheets of the same spreadsheet. (DB Extractor v3.5.0)
Example config:
logger:
console:
level: INFO
# file:
# level: INFO
# path: "file.log"
# Read all of these from environment variables
cognite:
host: ${COGNITE_BASE_URL}
project: ${COGNITE_PROJECT}
idp-authentication:
client-id: ${COGNITE_CLIENT_ID}
secret: ${COGNITE_CLIENT_SECRET}
token-url: ${COGNITE_TOKEN_URL}
scopes:
- ${COGNITE_BASE_URL}/.default
extractor:
# (Optional) Run mode of the extractor. It can be "single" or "continuous". Default value is "single".
# `continuous` : runs the configured queries using the configured schedules per endpoint
# `single` : will run the queries once per endpoint
# mode: continuous
# (Optional) Number of rows to fetch from sources before uploading
upload-queue-size: 100_000
# List of databases
databases:
- type: spreadsheet
name: local_excel
path: /path/to/file/test.xlsx
# List of queries
queries:
- name: test_excel_1
database: local_excel
sheet: test_1
query: >
SELECT
*
FROM
test_1
destination:
type: raw
database: "db_excel"
table: "local_test_1"
primary-key: "{row_number}"
- name: test_excel_2
database: local_excel
sheet: test_2
query: >
SELECT
*
FROM
test_2
destination:
type: raw
database: "db_excel"
table: "local_test_2"
primary-key: "{row_number}"
Logs from the extractor run:
2024-09-18 11:58:37.488 UTC INFO ] MainThread - Successfully ingested environment variables from /Users/rogerio.junior@cognitedata.com/workspace/db-extractor/.env
2024-09-18 11:58:37.497 UTC WARNING ] MainThread - No config type specified, default to local
2024-09-18 11:58:37.497 UTC INFO ] MainThread - Loaded local config file
2024-09-18 11:58:37.498 UTC INFO ] MainThread - No extraction pipeline configured
2024-09-18 11:58:37.498 UTC INFO ] MainThread - Cognite DB Extractor version 3.5.0
2024-09-18 11:58:37.498 UTC INFO ] MainThread - Starting extraction of 2 queries
2024-09-18 11:58:37.498 UTC INFO ] MainThread - Extraction will run once. To run the extractor continuously use continuous mode
2024-09-18 11:58:37.571 UTC INFO ] QueryExecutor_0 - Starting spreadsheet extraction
2024-09-18 11:58:37.571 UTC INFO ] QueryExecutor_1 - Starting spreadsheet extraction
2024-09-18 11:58:37.572 UTC INFO ] QueryExecutor_0 - Starting extraction of test_excel_1
2024-09-18 11:58:37.572 UTC INFO ] QueryExecutor_1 - Starting extraction of test_excel_2
2024-09-18 11:58:37.600 UTC INFO ] QueryExecutor_1 - Processing Excel file: 27 rows, 7 columns
2024-09-18 11:58:37.603 UTC INFO ] QueryExecutor_0 - Processing Excel file: 59 rows, 4 columns
2024-09-18 11:58:37.607 UTC INFO ] QueryExecutor_0 - No more rows for test_excel_1. 59 rows extracted in 0.0352 seconds
2024-09-18 11:58:37.608 UTC INFO ] QueryExecutor_1 - No more rows for test_excel_2. 27 rows extracted in 0.0362 seconds
2024-09-18 11:58:38.077 UTC INFO ] QueryExecutor_0 - Uploaded 86 raw rows
2024-09-18 11:58:38.078 UTC INFO ] QueryExecutor_1 - Uploaded 0 raw rows
Please find attached the spreadsheet file I used for this test. Remeber to change the path in the database config, if you would like to test this yourself.
Regards,
Rogerio Junior
Cognite Engineering
Now with this version of the extractor it works as it should, thank you all very much @Rogerio Júnior