Implemented

CDF Files: Adding File Size & Content Attributes (number of pages, file md5, file size, etc.)

Related products:API and SDKs

1 year ago
May 24, 2023
11 replies
181 views

Gayatri Babel
Practitioner

Request: CDF automatically generates key file attributes when a file is uploaded related to file content and size, including but not limited to:

file size
file md5
number of pages in a file

Use Cases:

file md5 - to compare files and eliminate duplicates
file size - to identify broken files and auto-delete them
number of pages in a file - contextualization of 50+ page diagrams requires a different method (uses parse_diagrams) than contextualization of diagrams with less than 50 pages. Number of pages in a file will be necessary to determine what method of contextualization is needed and currently there is no way to do this (manually or automatically).

Currently these attributes are implemented manually, but it would be incredibly useful and more efficient to have this information automatically available in technical workflows. Furthermore, in the case of contextualization of 50+ pages, it will be necessary.

Anita Hæhre
Head of Academy and Community
1 year ago
June 1, 2023

New→Gathering Interest

Anita Hæhre

Tommy Thorsen
Practitioner
1 year ago
June 2, 2023

Hi Gayatri,

It is not so trivial to add this info to the Files API, but we have all of this information inside the document processing system that exposes the Documents API. The page count field and the file size is already available in the Documents API. The hash is not there, but it would probably not be that hard to expose that as well. I can look into this.

Would it work for you to get this information from the Documents API? Bear in mind that the Documents API is eventually consistent with the Files API, and in some cases it can take some time from you upload something to the Files API until it is available in the Documents API.

Noah Karsky
Data Engineer
1 year ago
June 2, 2023

Meh, it would work to have them in the Documents API, but with files being in the UI and in the SDK’s I would prefer to have it in Files API. Nonetheless, if it is more quickly accomplished by adding the hash to the documents it would be great to have in the interim.

Tommy Thorsen
Practitioner
1 year ago
June 4, 2023

I guess it depends what kind of UI you are using, but at least Fusion has switched from using the Files API to use the Documents API in order to get those extra bits of functionality that the Documents API provides. Look at the screenshot below for proof 😄

I’m with you on the SDK support, though. It’s a real shame we have not been able to add Python SDK support for the Documents API yet.

Noah Karsky
Data Engineer
1 year ago
June 4, 2023

ooohh I did not notice that! Thanks for showing! @Tommy Thorsen

Anita Hæhre
Head of Academy and Community
1 year ago
June 5, 2023

Gathering Interest→Planned for development

Anita Hæhre

Bartosz Czernia
Practitioner
1 year ago
June 5, 2023

@Tommy Thorsen apologies if this is a basic question, but for the md5 of the file, this is generated based on the full contents of the file and not just the 1st MB of contents that is available to preview.

Is this correct?

Anita Hæhre
Head of Academy and Community
1 year ago
June 5, 2023

Hi @Bartosz Czernia, don't apologize for your question! It's great that you're reaching out to our community for help. Rest assured that we embrace all types of questions, no matter how basic they may seem. In fact, many others may have the same question but haven't asked yet. So by asking, you're not only helping yourself but also contributing to the collective knowledge of our community. We appreciate your engagement and encourage you to continue asking any questions you may have. We're here to support each other 🚀

Anita Hæhre

Tommy Thorsen
Practitioner
1 year ago
June 5, 2023

@Bartosz Czernia the hash is made on the original binary content of the file, and not on the extracted plain text. Also, I don’t know if matters to you, but the hash is not an md5 hash, it’s a sha256.

Tommy Thorsen
Practitioner
1 year ago
October 10, 2023

The hash is now available from the Document Search API. See documentation here.

Tommy Thorsen
Practitioner
1 year ago
October 10, 2023

Planned for development→Implemented

Cookie Policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Related topics

replacement gen 5 detect wall mounticon

Cordless Gen5 Detect - Wall dock/dok removal helpicon

Gen5detect Wall Mount - Feedbackicon

Ongoing Dyson Warranty Nightmare - Poor service, unmet promises and no path forwardicon

Uninstalling the Gen5 Detect Wall Dockicon

Sign up

Log in to the community

Scanning file for viruses.

This file cannot be downloaded

Cookie Policy

Cookie settings