Skip to main content
Solved

Read excel files (spreadsheet extractor?)


Forum|alt.badge.img

 

Appears to be an obvious question, may have been asked before. Just wondering if there is an extractor available to read the spreadsheet files and load it into raw/staged data. The Excel can contain multiple spread sheet and perhaps with configuration each of that spreadsheet and its columns can be mapped to the staged tables/columns?

 

Best answer by dinesh.makked

There is no spreadsheet extractor. 

One of the possible options is to upload the Excel files into CDF using the File Extractor. Then, you could use a Cognite Function to write data to the appropriate destination.

 

Since the Cognite Function is based on Python, you could pretty much do anything for the data wrangling.

 

 

If you want to chat more about it, please let me know.

 

 

 

View original
Did this topic help you find an answer to your question?

dinesh.makked
Seasoned

There is no spreadsheet extractor. 

One of the possible options is to upload the Excel files into CDF using the File Extractor. Then, you could use a Cognite Function to write data to the appropriate destination.

 

Since the Cognite Function is based on Python, you could pretty much do anything for the data wrangling.

 

 

If you want to chat more about it, please let me know.

 

 

 


Forum|alt.badge.img

Thank you, this helps and confirms our understanding as well. It may be a straight forward way to upload the files into CDF file type and then parse spreadsheets into a table in RAW. It would have made for a good extractor out of the box. Product idea?

 

This does include any advanced configuration (such as pick asset name from cell A3, pick company name from cell B2 etc), that is left to each use case/py program or can even be instrumented as a transformation.


Forum|alt.badge.img

@Alex Narayanan,
As mentioned, nothing out of the box is currently supported.  I can imagine a solution using a configuration (CDF File or FDM), for a given set of well-known xls file templates.  You can use cell references or named ranges.  Your extractor would use the config to pull the data.  For this to run on python, something like openpyxl, which would enable you to run as a Cognite function and enable the road to a reasonable reusable extractor

 

Jason


Forum|alt.badge.img
Alex Narayanan wrote:

Thank you, this helps and confirms our understanding as well. It may be a straight forward way to upload the files into CDF file type and then parse spreadsheets into a table in RAW. It would have made for a good extractor out of the box. Product idea?

 

This does include any advanced configuration (such as pick asset name from cell A3, pick company name from cell B2 etc), that is left to each use case/py program or can even be instrumented as a transformation.

I vote this up as a product idea.  Same for .doc file types.


Forum|alt.badge.img
  • Seasoned Practitioner
  • February 12, 2024

@Alex Narayanan,

DB Extractor now support local spreadsheet files: https://docs.cognite.com/cdf/integration/guides/extraction/db/db_configuration#databases


Reply


Cookie Policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie Settings