COGNITE HUB

Check the documentation

Ask the Community

Take a look at Academy

Cognite Status Page

Contact Cognite Support

Products Cognite Data Fusion

Cognite InField

Cognite InRobot

Cognite Maintain

Cognite Remote

Website cognite.com

Groups
- - Knowledge Base
Product Updates
Product Ideas
Docs
Events
Academy
Our Partners

Impact 2024: The Industrial Data and AI Conference for and by Users | Nominate Speakers Now for a Ch...

(Mon, 14 Oct, 2:00 pm)

Home
Search

Replies posted by Tommy Thorsen

7 Replies

Newest first

Oldest first

Gayatri BabelPractitioner

CDF Files: Adding File Size & Content Attributes (number of pages, file md5, file size, etc.)Implemented

Planned for development→Implemented

6 months ago

Gayatri BabelPractitioner

CDF Files: Adding File Size & Content Attributes (number of pages, file md5, file size, etc.)Implemented

The hash is now available from the Document Search API. See documentation here.

6 months ago

rkannanActive

asked in Digitalization Community

Document insights using CDF

Hi Raj!My name is Tommy and I’m the tech lead for the team that owns the Documents API. I’ll try to answer your questions as best I can: Can we go beyond the 1MB limit to allow OCR, index and search of any free text within the documents. Is this configurable? The 1MB limit is not on the source file, it is on the extracted plain text. The source files can be as large as 2GB, which I think should include most things we normally call documents.The 1MB limit on the extracted plain text is also intended to mainly protect our systems against files that aren’t really documents. A regular document will almost never have more than 1MB of plain text in it.Let’s try some really rough math: A common character count estimate for a page that contains nothing but text, is 3000 characters per page. Looking a bit closer at the implementation, it seems our limit is not actually 1MB of text, but 1 million characters. (This can matter if you have many multi-byte characters in your documents). Anyway, 1M c

10 months ago

robert.rillCommitted

Interactive Engineering Diagrams - Approved DiagramsPlanned for development

Hi Robert!I strongly agree with you, and I have been making the same argument myself recently. I already put an item on the backlog to make this change.I don’t have a specific timeline for this yet, but your comment will certainly help get this prioritized higher.Thanks 😊

10 months ago

Gayatri BabelPractitioner

CDF Files: Adding File Size & Content Attributes (number of pages, file md5, file size, etc.)Implemented

@Bartosz Czernia the hash is made on the original binary content of the file, and not on the extracted plain text. Also, I don’t know if matters to you, but the hash is not an md5 hash, it’s a sha256.

6 months ago

Gayatri BabelPractitioner

CDF Files: Adding File Size & Content Attributes (number of pages, file md5, file size, etc.)Implemented

I guess it depends what kind of UI you are using, but at least Fusion has switched from using the Files API to use the Documents API in order to get those extra bits of functionality that the Documents API provides. Look at the screenshot below for proof 😄I’m with you on the SDK support, though. It’s a real shame we have not been able to add Python SDK support for the Documents API yet.

6 months ago

Gayatri BabelPractitioner

CDF Files: Adding File Size & Content Attributes (number of pages, file md5, file size, etc.)Implemented

Hi Gayatri,It is not so trivial to add this info to the Files API, but we have all of this information inside the document processing system that exposes the Documents API. The page count field and the file size is already available in the Documents API. The hash is not there, but it would probably not be that hard to expose that as well. I can look into this.Would it work for you to get this information from the Documents API? Bear in mind that the Documents API is eventually consistent with the Files API, and in some cases it can take some time from you upload something to the Files API until it is available in the Documents API.

6 months ago

Generate client secret for OID

Please generate a client secret once every 180 days.

Choose the application:

Badge winners

Nestor Gustavo Cruz Bañoshas earned the badge Cognite Data Fusion Fundamentals
Juan Lozanohas earned the badge Cognite Data Fusion Fundamentals
Brad Downenhas earned the badge Cognite Data Fusion Fundamentals
Bente Kleivdal Hellahas earned the badge Working with Cognite InRobot
Prasanna Kumar Tellapanenihas earned the badge Working with Cognite InField

Show all badges

Terms & Conditions Cookie Settings

Sign up

Already have an account? Login

Log in to the community

Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.

Username or e-mail

Back to overview

Scanning file for viruses.

Sorry, we're still checking this file's contents to make sure it's safe to download. Please try again in a few minutes.

This file cannot be downloaded

Sorry, our virus scanner detected that this file isn't safe to download.

RESOURCES Learn Documentation Support Status Page
GETTING STARTED Sign up to Cognite Hub Get Started with Cognite Hub About Us Contact Us
FOLLOW US