Cognite offers powerful tools designed to parse and extract structured information from both structured and unstructured documents. Here’s an overview of our capabilities:
Diagram Parsing: Diagram parsing focuses on engineering diagrams like P&IDs (Piping and Instrumentation Diagrams). It automatically identifies and maps asset and file tags to resources in Cognite Data Fusion (CDF).
After contextualizing, detected tags are visualized in our industrial tools, such as Search, Canvas, enabling users to efficiently explore and navigate data.
For vectorized files, it also recognizes symbols, lines, and connections, constructing a knowledge graph within CDF. The availability of this knowledge graph unlocks numerous use cases. For instance, users can query the graph to locate all valves connected to a specific piece of equipment, facilitating isolation planning.
Learn more about Diagram Parsing for asset-centric data models
Learn more about Diagram Parsing for data modeling (Public Beta)
Document Summary: Document summary is an AI-powered assistant that helps you create a summary for one or a maximum of 100 documents in .pdf format. You can upload PDF files to Cognite Data Fusion (CDF) and use the Document summary API to create a document summary.
Learn more about Document Summary
Document Question Answering: The Document question answering is an AI-powered assistant that, with the help of the Semantic search API, helps you find relevant information for any question and passes the retrieved information into an LLM for interpretation into a natural language answer.
Learn more about Document Question Answering
Semantic Search API: The Semantic search API helps you build AI systems that use data from documents. With the API, you can search for texts with similar semantic meaning, unlike the search where you search for keywords, and find relevant texts written in other languages.
Learn more about Semantic Search
Document Parsing (Early Adopter): This feature extracts data from documents with key-value representations such as datasheets, equipment specs, or reports into data model views. You can modify and verify the extracted data before you approve the parsing job. The approved data is ingested into Cognite Data Fusion (CDF) data model instances and becomes available for users to explore and analyze, for instance, for asset and equipment monitoring.
The input for the document parsing job must meet these criteria:
- PDF file format with English text and a maximum of 100 pages.
- Results improve with smaller files.
- Embedded text or scanned documents
- Describe a single asset or piece of equipment
- Key-value pair data representation
Learn more about Document Parsing