Skip to main content
Gathering Interest

Adding infer_schema_length parameter to DB Extractor

Related products:Extractors
  • May 1, 2025
  • 1 reply
  • 31 views

I believe we need bring infer_schema_length to db extractor parameter to improve it flexibility.
If I understand correctly polars  library was used in db extractor.
According docs:

infer_schema_length: int | None = 100,

this means it defines data types by first 100 rows by default. This is not fit for our client data.

Currently I extract data from csv and experienced with 2 issues. I think they are related to each other. As I understood: Extractor defined column data type based first n (probably 100) rows:

  1.  could not parse `"1,616,178"` as dtype `f64` at column '*column_name*' (column number 10) This value on the line 3161
  2. could not parse `"2.15"` as dtype `i64` at column '*column_name*' (column number 15)  This value on the line 118 - all previous values equal “0”

after error in terminal I see suggestion:

You might want to try:

- increasing `infer_schema_length` (e.g. `infer_schema_length=10000`),

- specifying correct dtype with the `schema_overrides` argument

- setting `ignore_errors` to `True`,

- adding `"1,616,178"` to the `null_values` list.

 

1 reply

Sofie Svartdal Berge
Seasoned Practitioner

Hi ​@almir.mavliutov
Thank you for sharing your idea with us! I've forwarded your suggestion to the relevant product manager, who will contact you for further details.

Best regards,
Sofie