Solved

Read excel files (spreadsheet extractor?)

  • 19 May 2023
  • 5 replies
  • 135 views

Userlevel 1
Badge

 

Appears to be an obvious question, may have been asked before. Just wondering if there is an extractor available to read the spreadsheet files and load it into raw/staged data. The Excel can contain multiple spread sheet and perhaps with configuration each of that spreadsheet and its columns can be mapped to the staged tables/columns?

 

icon

Best answer by dinesh.makked 19 May 2023, 20:18

View original

5 replies

Userlevel 1

There is no spreadsheet extractor. 

One of the possible options is to upload the Excel files into CDF using the File Extractor. Then, you could use a Cognite Function to write data to the appropriate destination.

 

Since the Cognite Function is based on Python, you could pretty much do anything for the data wrangling.

 

 

If you want to chat more about it, please let me know.

 

 

 

Userlevel 1
Badge

Thank you, this helps and confirms our understanding as well. It may be a straight forward way to upload the files into CDF file type and then parse spreadsheets into a table in RAW. It would have made for a good extractor out of the box. Product idea?

 

This does include any advanced configuration (such as pick asset name from cell A3, pick company name from cell B2 etc), that is left to each use case/py program or can even be instrumented as a transformation.

Userlevel 4
Badge

@Alex Narayanan,
As mentioned, nothing out of the box is currently supported.  I can imagine a solution using a configuration (CDF File or FDM), for a given set of well-known xls file templates.  You can use cell references or named ranges.  Your extractor would use the config to pull the data.  For this to run on python, something like openpyxl, which would enable you to run as a Cognite function and enable the road to a reasonable reusable extractor

 

Jason

Thank you, this helps and confirms our understanding as well. It may be a straight forward way to upload the files into CDF file type and then parse spreadsheets into a table in RAW. It would have made for a good extractor out of the box. Product idea?

 

This does include any advanced configuration (such as pick asset name from cell A3, pick company name from cell B2 etc), that is left to each use case/py program or can even be instrumented as a transformation.

I vote this up as a product idea.  Same for .doc file types.

Userlevel 4
Badge

@Alex Narayanan,

DB Extractor now support local spreadsheet files: https://docs.cognite.com/cdf/integration/guides/extraction/db/db_configuration#databases

Reply