Two years ago we used the Python library Pytesseract in another project, not related to Cognite. However, we faced some challenges since the names of the elements we needed weren’t laid out in a straight line throughout the document. In some cases, the labels were inside a circle, with each part of the name in a separate semicircle. At that time, the results weren’t satisfactory, with a low success rate.
Thank you, @Elcio Cardoso da Silva . We’re aware of this project, but our main objective here is to determine if there’s any built-in solution within Cognite that we could leverage.
Cognite Team, if you have any insights or suggestions regarding this, please share them with us.
Thanks again!
Hi @Anita Hæhre
Could you help me connect with someone at Cognite regarding a question for a client proposal? I'm exploring whether we can leverage a Cognite OCR feature or something similar.
One of the source systems for building the asset hierarchy is based on P&ID documents, screenshots, and other documentation, as the client doesn’t have a consolidated asset hierarchy in a system like SAP.
Thank you very much for any guidance on this!
Hi @Andre Alves apologies for the delayed response - I was away on a short holiday. I’ll update you here as soon as I’ve identified the best person to follow up on this. Thanks for your patience!
No need to apologize @Anita Hæhre ! You’re always incredibly helpful, and I’m very grateful for your assistance. I’ll be waiting for your update, and thanks in advance for your help.
Hi @Andre Alves ,
Thank you for the question :)
You can extract tags in engineering diagram using the Diagram parsing service in Beta with `pattern_mode` enabled. This mode allows you to provide a list of examples, and the service will identify tags that follow similar patterns.
The sample field can be string, or list of alternative strings. Each string defines a pattern. E.g. 21-PT-1019 enables detecting tags consisting of 2 digits, 2 letters and 4 digits. Special characters are not necessary for detecting, but will be included in the detected string. It is possible to mark parts of the sample as constant strings by enclosing them in square brackets. Within square brackets, a | character can be used to separate alternative constants. Alternative constants must be either all digits or all letters.
Here is an example code which can be run in Cognite Jupyter Notebook:
from cognite.client import CogniteClient
from cognite.client.config import FusionNotebookConfig
from cognite.client.data_classes.contextualization import FileReference
# Instantiate Cognite SDK client:
client = CogniteClient(FusionNotebookConfig(api_subversion="20230101-beta"))
entities = s
{
"sample":"PT-01-AB",
"resourceType":"instrument"
}
] # declare your patterns and resource types (optional)
file_references = s
FileReference(file_id=..., first_page=1, last_page=1)
]
model = client.diagrams.detect(
entities=entities,
file_references=file_references,
pattern_mode=True,
partial_match=True,
)
model.result
Hope it helps :)
Best regards,
Que Tran