Skip to main content
Solved

SharePoint Online - File content extraction [Cognite Official]


Forum|alt.badge.img+2

hello Community,

Can someone please share the approach (along with code) for extracting files from sharepoint online (xls , with multiple worksheets) and extract the content and load them as RAW tables in CDF. 

 

Is there a direct feature available in Sharepoint file extractor that does this job? Should we use SDK to extend the file extraction and then read the content and insert into tables?. 

 

If we have a large set of files and each containing multiple sheets, it can be hard to process all of them dynamically. 

 

Please advise.

Best answer by roman.chesnokov

Hey @eashwar11 ,  the file extractor allows you to upload files to CDF, and then you can load each file with pandas and upload each page as a RAW table. It’s also possible to write your own service, which will take the files directly and upload them as a RAW. For example, a snippet to upload data from each sheet to a new RAW table:

def upload_xls_file(client, file_path, db_name='Test'):
    """
    Uploads an XLS file as a Pandas dataframe, with each sheet as a new dataframe.

    Uploads it into RAW table.

    Args:
        client (CogniteCLient): a Cognite client instance
        file_path (str): The path to the XLS file,
        db_name (str): the name of the CDF RAW database

    Returns:
        A dictionary where each key is the sheet name and each value is a Pandas dataframe.
    """
    xls_file = pd.ExcelFile(file_path)
    sheet_names = xls_file.sheet_names
    dataframes = {}

    for sheet_name in sheet_names:
        client.raw.tables.create(db_name, sheet_name)
        df = xls_file.parse(sheet_name)
        dataframes[sheet_name] = df
        client.raw.rows.insert_dataframe(db_name, sheet_name, df)

    return dataframes

 

View original
Did this topic help you find an answer to your question?

2 replies

roman.chesnokov
Seasoned Practitioner
  • Seasoned Practitioner
  • 56 replies
  • Answer
  • May 31, 2023

Hey @eashwar11 ,  the file extractor allows you to upload files to CDF, and then you can load each file with pandas and upload each page as a RAW table. It’s also possible to write your own service, which will take the files directly and upload them as a RAW. For example, a snippet to upload data from each sheet to a new RAW table:

def upload_xls_file(client, file_path, db_name='Test'):
    """
    Uploads an XLS file as a Pandas dataframe, with each sheet as a new dataframe.

    Uploads it into RAW table.

    Args:
        client (CogniteCLient): a Cognite client instance
        file_path (str): The path to the XLS file,
        db_name (str): the name of the CDF RAW database

    Returns:
        A dictionary where each key is the sheet name and each value is a Pandas dataframe.
    """
    xls_file = pd.ExcelFile(file_path)
    sheet_names = xls_file.sheet_names
    dataframes = {}

    for sheet_name in sheet_names:
        client.raw.tables.create(db_name, sheet_name)
        df = xls_file.parse(sheet_name)
        dataframes[sheet_name] = df
        client.raw.rows.insert_dataframe(db_name, sheet_name, df)

    return dataframes

 


Dilini Fernando
Seasoned Practitioner
Forum|alt.badge.img+2

Hi @eashwar11,

We appreciate your contribution to our community hub! We have chosen to move your article to our hub's How-To section as it will greatly benefit other members of our community. Thank you for your understanding, and we look forward to seeing more great contributions from you in the future! 

Best regards,
Dilini 

 


Reply


Cookie Policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie Settings