Skip to main content

Hi Team,

I am trying to get details of broken files uploaded in CDF . Is there any way to query actual file size from CDF using python sdk without downloading the file? Please add a feature to query file size using python sdk to get the corrupted/broken files.

 

Thanks in advance.

It hasn't helped.


Hi @Rajeev z ranjan,

I hope the above helped. As of now, I’m closing this thread. Please feel free to create a new post if you have any questions.

Best regards,

Dilini

 


Hi @Rajeev z ranjan,

Did the above help you?

Br,
Dilini


I also just now learned that you can send HEAD requests to the download urls, like this:

def get_content_lengths(client, external_ids):
urls = client.files.retrieve_download_urls(external_id=external_ids)
return {
external_id: client.files._http_client.request("HEAD", urls[external_id]).headers["Content-Length"]
for external_id in external_ids
}

This might be a workaround if the files for some reason are not available in the documents API. If they are in the documents API, this should require the fewest requests. 


Hi,
You can get the size using the Documents API. List documents gives you the size (in bytes) in `sourceFile`.`size`. 
You can also use the Python SDK. https://cognite-sdk-python.readthedocs-hosted.com/en/latest/documents.html#cognite.client._api.documents.DocumentsAPI.list
Example code:
 

from cognite.client.data_classes.documents import DocumentProperty
from cognite.client.data_classes import filters

file_id_filter = filters.Equals(DocumentProperty.id, YOUR_FILE_ID)
doc = client.documents.list(filter=file_id_filter)
doc>0].source_file.size

 


Hi!

I cant see that the API has a way to get the file size without downloading the file. 

It seems like a very useful feature to have though. 
If this is a one time cleanup job, it would maybe be ok, though time consuming to do len(client.files.download_bytes(file_id=file_id)) for all the file ids?
 


Reply