Solved

Extracting Asset Hierarchy from PDF P&IDs in Cognite Data Fusion

4 months ago
October 29, 2024
9 replies
113 views

+13

Andre Alves
MVP
138 replies

In a hypothetical scenario where we need to extract asset hierarchy directly from PDF-based P&ID documents, does Cognite provide a built-in library to support this? Or would we need to rely on external libraries like PyMuPDF or pdfplumber for data extraction—or even tools like Pytesseract in cases involving scanned images?

Thank you for any guidance on best practices or integrations for this use case!

Best answer by Que Tran

Hi @Andre Alves,

You can use the API (like in the script I shared earlier) to extract tags directly from P&IDs without needing to pre-contextualize the files. The API inputs are the patterns you want to extract, so there’s no need for assets to already exist in the CDF.

Best regards,

Que Tran

View original

Did this topic help you find an answer to your question?

Elcio Cardoso da Silva
Committed
9 replies
4 months ago
October 30, 2024

Two years ago we used the Python library Pytesseract in another project, not related to Cognite. However, we faced some challenges since the names of the elements we needed weren’t laid out in a straight line throughout the document. In some cases, the labels were inside a circle, with each part of the name in a separate semicircle. At that time, the results weren’t satisfactory, with a low success rate.

+13

Andre Alves
Author
MVP
138 replies
4 months ago
October 30, 2024

Thank you, @Elcio Cardoso da Silva . We’re aware of this project, but our main objective here is to determine if there’s any built-in solution within Cognite that we could leverage.

Cognite Team, if you have any insights or suggestions regarding this, please share them with us.

Thanks again!

André Alves

+13

Andre Alves
Author
MVP
138 replies
4 months ago
November 1, 2024

Hi @Anita Hæhre

Could you help me connect with someone at Cognite regarding a question for a client proposal? I'm exploring whether we can leverage a Cognite OCR feature or something similar.

One of the source systems for building the asset hierarchy is based on P&ID documents, screenshots, and other documentation, as the client doesn’t have a consolidated asset hierarchy in a system like SAP.

Thank you very much for any guidance on this!

André Alves

Anita Hæhre
Head of Academy and Community
590 replies
4 months ago
November 4, 2024

Hi @Andre Alves apologies for the delayed response - I was away on a short holiday. I’ll update you here as soon as I’ve identified the best person to follow up on this. Thanks for your patience!

Anita Hæhre

+13

Andre Alves
Author
MVP
138 replies
4 months ago
November 5, 2024

No need to apologize @Anita Hæhre ! You’re always incredibly helpful, and I’m very grateful for your assistance. I’ll be waiting for your update, and thanks in advance for your help.

André Alves

Que Tran
Practitioner
43 replies
4 months ago
November 8, 2024

Hi @Andre Alves ,
Thank you for the question :)

You can extract tags in engineering diagram using the Diagram parsing service in Beta with `pattern_mode` enabled. This mode allows you to provide a list of examples, and the service will identify tags that follow similar patterns.

The sample field can be string, or list of alternative strings. Each string defines a pattern. E.g. 21-PT-1019 enables detecting tags consisting of 2 digits, 2 letters and 4 digits. Special characters are not necessary for detecting, but will be included in the detected string. It is possible to mark parts of the sample as constant strings by enclosing them in square brackets. Within square brackets, a | character can be used to separate alternative constants. Alternative constants must be either all digits or all letters.

Here is an example code which can be run in Cognite Jupyter Notebook:

from cognite.client import CogniteClient
from cognite.client.config import FusionNotebookConfig
from cognite.client.data_classes.contextualization import FileReference

# Instantiate Cognite SDK client:
client = CogniteClient(FusionNotebookConfig(api_subversion="20230101-beta"))


entities = [
            {
                "sample":"PT-01-AB",
                "resourceType":"instrument"
            }
        ] # declare your patterns and resource types (optional)


file_references = [
    FileReference(file_id=..., first_page=1, last_page=1)
]

model = client.diagrams.detect(
    entities=entities,
    file_references=file_references,
    pattern_mode=True,
    partial_match=True,
)
model.result

Hope it helps :)

Best regards,
Que Tran

+13

Andre Alves
Author
MVP
138 replies
4 months ago
November 12, 2024

Thanks, @Que Tran . We’re currently in discussions with some clients who face challenges in managing asset hierarchies. One approach we're exploring is using P&ID files to identify tag patterns and automatically build the hierarchy. Could you clarify if it’s possible to use the API without pre-contextualizing the files? The idea is to process these files (e.g., through OpenCV) to establish the hierarchy first, and then, once we have an asset hierarchy in place, begin the contextualization process.

André Alves

Que Tran
Practitioner
43 replies
Answer
4 months ago
November 13, 2024

Hi @Andre Alves,

Best regards,

Que Tran

+13

Andre Alves
Author
MVP
138 replies
4 months ago
November 14, 2024

Thanks again, @Que Tran Your help has been fantastic.
We’ll definitely be using it, and we’ll let you know if we encounter any issues or challenges.
Have a great day!

André Alves

Reply

Cookie Policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos

Reply

Related topics

Wanneer is verwijdering BKR registratie te zienicon

Wanneer is de verwijdering van mijn BKR-registratie zichtbaar?icon

Verzoek tot verwijderen van BKR Registratieicon

Negatieve BKR registratie. spoed?icon

Verzoek met spoed BKR verwijderingicon

Sign up

Log in to the community

Scanning file for viruses.

This file cannot be downloaded

Cookie Policy

Cookie settings