Solved

read a csv file from Files in CDF as a dataframe

  • 3 August 2023
  • 5 replies
  • 111 views

Badge +2

I am looking forward to a solution for reading a csv files that has 100 columns and 200 rows. It is stored in CDF as ‘Files’. How do I read the file and then use pandas to convert the file as a dataframe?

diet_f= (client.files.list(name='diet_daily.csv'))[0]
diet_daily_transformed = pd.read_csv(diet_f.name)
diet_daily_transformed

This isnt working and throwing an error. Please advise the best means to perform this task. 

icon

Best answer by Håkon V. Treider 4 August 2023, 06:52

View original

5 replies

Hi @eashwar11, when you list a file from CDF, you are not actually retrieving it, you are retrieving the object (including the metadata set in it). If you want to download it, then you could do as is described in the docs here. Then you can read from pandas using pd.read_csv(“my_file.csv”). I hope that helps. 😀

Badge +2

Hi @HaydenH 

I am implementing this in a cognite function. So, I would want the function to take the file, download and then use that file to read it in pd.read_csv(). All these must happen in-tandem. 

I don't want to download to my local disk and then read the file from my local. Since it must all happen online within cognite itself, what is the best way to do that?

Ideally, it is apt to have a function like client.files.retrieve_dataframe(filename = “myfile.csv”) etc. 

But I don't have any means like this now. Please advise.

Userlevel 4
Badge

In Cognite Functions you still have access to some disk space under /tmp; use for example  tempfile.TemporaryDirectory , then you don’t have to worry about where this location is.

You may also as you say load it directly into memory, but this can be a memory issue for really large files (then do as @HaydenH suggests and store to disk first):

import pandas as pd
from io import BytesIO

df = pd.read_csv(
BytesIO(
client.files.download_bytes(external_id="foo")
)
)

 

Userlevel 3

@eashwar11 did the above help you? 

Badge +2

Yes @Carin Meems . I used this and added the following code to make it available as a pandas DataFrame. 

If CDF wants to create a new library function (to fetch the file / csv as DataFrame), this can be used. 

fileobj= (client.files.list(name='file.csv'))[0]
file_content = client.files.download_bytes(id=fileobj.id)
csv = file_content.decode('utf-8')
file_df = pd.read_csv(StringIO(str(csv)))

 

Reply