Solved

Retrieving file sizes for CDF files

  • 21 December 2023
  • 6 replies
  • 170 views

Badge

Hi Team,

I am trying to get details of broken files uploaded in CDF . Is there any way to query actual file size from CDF using python sdk without downloading the file? Please add a feature to query file size using python sdk to get the corrupted/broken files.

 

Thanks in advance.

icon

Best answer by Dilini Fernando 24 January 2024, 11:46

View original

6 replies

It hasn't helped.

Userlevel 4
Badge +2

Hi @Rajeev z ranjan,

I hope the above helped. As of now, I’m closing this thread. Please feel free to create a new post if you have any questions.

Best regards,

Dilini

 

Userlevel 4
Badge +2

Hi @Rajeev z ranjan,

Did the above help you?

Br,
Dilini

Userlevel 1

I also just now learned that you can send HEAD requests to the download urls, like this:

def get_content_lengths(client, external_ids):
urls = client.files.retrieve_download_urls(external_id=external_ids)
return {
external_id: client.files._http_client.request("HEAD", urls[external_id]).headers["Content-Length"]
for external_id in external_ids
}

This might be a workaround if the files for some reason are not available in the documents API. If they are in the documents API, this should require the fewest requests. 

Hi,
You can get the size using the Documents API. List documents gives you the size (in bytes) in `sourceFile`.`size`. 
You can also use the Python SDK. https://cognite-sdk-python.readthedocs-hosted.com/en/latest/documents.html#cognite.client._api.documents.DocumentsAPI.list
Example code:
 

from cognite.client.data_classes.documents import DocumentProperty
from cognite.client.data_classes import filters

file_id_filter = filters.Equals(DocumentProperty.id, YOUR_FILE_ID)
doc = client.documents.list(filter=file_id_filter)
doc[0].source_file.size

 

Userlevel 1

Hi!

I cant see that the API has a way to get the file size without downloading the file. 

It seems like a very useful feature to have though. 
If this is a one time cleanup job, it would maybe be ok, though time consuming to do len(client.files.download_bytes(file_id=file_id)) for all the file ids?
 

Reply