Gathering Interest

Adding infer_schema_length parameter to DB Extractor

Related products:Extractors

Forum|Forum|7 months ago
May 1, 2025
1 reply
31 views

almir.mavliutov
Practitioner

I believe we need bring infer_schema_length to db extractor parameter to improve it flexibility.
If I understand correctly polars library was used in db extractor.
According docs:

infer_schema_length: int | None = 100,

this means it defines data types by first 100 rows by default. This is not fit for our client data.

Currently I extract data from csv and experienced with 2 issues. I think they are related to each other. As I understood: Extractor defined column data type based first n (probably 100) rows:

could not parse `"1,616,178"` as dtype `f64` at column '*column_name*' (column number 10) This value on the line 3161
could not parse `"2.15"` as dtype `i64` at column '*column_name*' (column number 15) This value on the line 118 - all previous values equal “0”

after error in terminal I see suggestion:

You might want to try:
- increasing `infer_schema_length` (e.g. `infer_schema_length=10000`),
- specifying correct dtype with the `schema_overrides` argument
- setting `ignore_errors` to `True`,
- adding `"1,616,178"` to the `null_values` list.

Sofie Svartdal Berge
Seasoned Practitioner
Forum|Forum|7 months ago
May 12, 2025

Hi @almir.mavliutov,
Thank you for sharing your idea with us! I've forwarded your suggestion to the relevant product manager, who will contact you for further details.

Best regards,
Sofie

Like

Sign up

Welcome to Cognite Hub

Scanning file for viruses.

This file cannot be downloaded