Skip to main content
Solved

read a csv file from Files in CDF as a dataframe


Forum|alt.badge.img+2

I am looking forward to a solution for reading a csv files that has 100 columns and 200 rows. It is stored in CDF as ‘Files’. How do I read the file and then use pandas to convert the file as a dataframe?

diet_f= (client.files.list(name='diet_daily.csv'))[0]
diet_daily_transformed = pd.read_csv(diet_f.name)
diet_daily_transformed

This isnt working and throwing an error. Please advise the best means to perform this task. 

Best answer by Håkon V. Treider

In Cognite Functions you still have access to some disk space under /tmp; use for example  tempfile.TemporaryDirectory , then you don’t have to worry about where this location is.

You may also as you say load it directly into memory, but this can be a memory issue for really large files (then do as @HaydenH suggests and store to disk first):

import pandas as pd
from io import BytesIO

df = pd.read_csv(    
    BytesIO(
        client.files.download_bytes(external_id="foo")
    )
)

 

View original
Did this topic help you find an answer to your question?

5 replies

  • Practitioner
  • 20 replies
  • August 3, 2023

Hi @eashwar11, when you list a file from CDF, you are not actually retrieving it, you are retrieving the object (including the metadata set in it). If you want to download it, then you could do as is described in the docs here. Then you can read from pandas using pd.read_csv(“my_file.csv”). I hope that helps. 😀


Forum|alt.badge.img+2
  • Author
  • 42 replies
  • August 4, 2023

Hi @HaydenH 

I am implementing this in a cognite function. So, I would want the function to take the file, download and then use that file to read it in pd.read_csv(). All these must happen in-tandem. 

I don't want to download to my local disk and then read the file from my local. Since it must all happen online within cognite itself, what is the best way to do that?

Ideally, it is apt to have a function like client.files.retrieve_dataframe(filename = “myfile.csv”) etc. 

But I don't have any means like this now. Please advise.


Forum|alt.badge.img

In Cognite Functions you still have access to some disk space under /tmp; use for example  tempfile.TemporaryDirectory , then you don’t have to worry about where this location is.

You may also as you say load it directly into memory, but this can be a memory issue for really large files (then do as @HaydenH suggests and store to disk first):

import pandas as pd
from io import BytesIO

df = pd.read_csv(    
    BytesIO(
        client.files.download_bytes(external_id="foo")
    )
)

 


  • Seasoned Practitioner
  • 223 replies
  • August 9, 2023

@eashwar11 did the above help you? 


Forum|alt.badge.img+2
  • Author
  • 42 replies
  • August 9, 2023

Yes @Carin Meems . I used this and added the following code to make it available as a pandas DataFrame. 

If CDF wants to create a new library function (to fetch the file / csv as DataFrame), this can be used. 

fileobj= (client.files.list(name='file.csv'))[0]
file_content = client.files.download_bytes(id=fileobj.id)
csv = file_content.decode('utf-8')
file_df = pd.read_csv(StringIO(str(csv)))

 


Reply


Cookie Policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie Settings