Skip to main content

hello Community,

Can someone please share the approach (along with code) for extracting files from sharepoint online (xls , with multiple worksheets) and extract the content and load them as RAW tables in CDF. 

 

Is there a direct feature available in Sharepoint file extractor that does this job? Should we use SDK to extend the file extraction and then read the content and insert into tables?. 

 

If we have a large set of files and each containing multiple sheets, it can be hard to process all of them dynamically. 

 

Please advise.

Hey @eashwar11 ,  the file extractor allows you to upload files to CDF, and then you can load each file with pandas and upload each page as a RAW table. It’s also possible to write your own service, which will take the files directly and upload them as a RAW. For example, a snippet to upload data from each sheet to a new RAW table:

def upload_xls_file(client, file_path, db_name='Test'):
"""
Uploads an XLS file as a Pandas dataframe, with each sheet as a new dataframe.

Uploads it into RAW table.

Args:
client (CogniteCLient): a Cognite client instance
file_path (str): The path to the XLS file,
db_name (str): the name of the CDF RAW database

Returns:
A dictionary where each key is the sheet name and each value is a Pandas dataframe.
"""
xls_file = pd.ExcelFile(file_path)
sheet_names = xls_file.sheet_names
dataframes = {}

for sheet_name in sheet_names:
client.raw.tables.create(db_name, sheet_name)
df = xls_file.parse(sheet_name)
dataframesasheet_name] = df
client.raw.rows.insert_dataframe(db_name, sheet_name, df)

return dataframes

 


Hi @eashwar11,

We appreciate your contribution to our community hub! We have chosen to move your article to our hub's How-To section as it will greatly benefit other members of our community. Thank you for your understanding, and we look forward to seeing more great contributions from you in the future! 

Best regards,
Dilini 

 


Reply