Skip to main content
Answer

Retrieving file sizes for CDF files

  • December 21, 2023
  • 6 replies
  • 197 views

Forum|alt.badge.img

Hi Team,

I am trying to get details of broken files uploaded in CDF . Is there any way to query actual file size from CDF using python sdk without downloading the file? Please add a feature to query file size using python sdk to get the corrupted/broken files.

 

Thanks in advance.

Best answer by Dilini Fernando

Hi @Rajeev z ranjan,

I hope the above helped. As of now, I’m closing this thread. Please feel free to create a new post if you have any questions.

Best regards,

Dilini

 

6 replies

Hi!

I cant see that the API has a way to get the file size without downloading the file. 

It seems like a very useful feature to have though. 
If this is a one time cleanup job, it would maybe be ok, though time consuming to do len(client.files.download_bytes(file_id=file_id)) for all the file ids?
 


Que Tran
Practitioner
Forum|alt.badge.img
  • Practitioner
  • January 3, 2024

Hi,
You can get the size using the Documents API. List documents gives you the size (in bytes) in `sourceFile`.`size`. 
You can also use the Python SDK. https://cognite-sdk-python.readthedocs-hosted.com/en/latest/documents.html#cognite.client._api.documents.DocumentsAPI.list
Example code:
 

from cognite.client.data_classes.documents import DocumentProperty
from cognite.client.data_classes import filters

file_id_filter = filters.Equals(DocumentProperty.id, YOUR_FILE_ID)
doc = client.documents.list(filter=file_id_filter)
doc[0].source_file.size

 


I also just now learned that you can send HEAD requests to the download urls, like this:

def get_content_lengths(client, external_ids):
urls = client.files.retrieve_download_urls(external_id=external_ids)
return {
external_id: client.files._http_client.request("HEAD", urls[external_id]).headers["Content-Length"]
for external_id in external_ids
}

This might be a workaround if the files for some reason are not available in the documents API. If they are in the documents API, this should require the fewest requests. 


Dilini Fernando
Seasoned Practitioner
Forum|alt.badge.img+2
  • Seasoned Practitioner
  • January 11, 2024

Hi @Rajeev z ranjan,

Did the above help you?

Br,
Dilini


Dilini Fernando
Seasoned Practitioner
Forum|alt.badge.img+2
  • Seasoned Practitioner
  • Answer
  • January 24, 2024

Hi @Rajeev z ranjan,

I hope the above helped. As of now, I’m closing this thread. Please feel free to create a new post if you have any questions.

Best regards,

Dilini

 


  • Committed
  • February 5, 2024

It hasn't helped.