We have an equipment with name PI91111 on the P&ID and on the other hand we have this equipment in the asset Hierarchy.PNID Diagarm This equipment is not getting contextualized for this P&ID.I have used standard model and advanced model with min tokens 1 & 2 as well.The ocr output for this is below:

Solved

Issue on Equipment's to P&ID file Contextualization

1 year ago
4 August 2022
9 replies
122 views

Userlevel 2

Kumar Varaganti Sharath
Committed
13 replies

We have an equipment with name PI91111 on the P&ID and on the other hand we have this equipment in the asset Hierarchy.

This equipment is not getting contextualized for this P&ID.

I have used standard model and advanced model with min tokens 1 & 2 as well.

The ocr output for this is below:

icon

Best answer by Ola Liabøtrø 9 August 2022, 14:12

View original

9 replies

Userlevel 1

Ola Liabøtrø
Practitioner
22 replies
1 year ago
5 August 2022

Hi! The algorithm is a bit hesitant to cherry-pick tokens from an ocr result, and this is what happens here. If you would look for PI-91111-9D, you would find it. Are there many instances like this in this project?

Userlevel 2

Kumar Varaganti Sharath
Author
Committed
13 replies
1 year ago
5 August 2022

yes there are 400-500 instances like these...where they are not getting picked. It should be the combination of PI and 91111 (which is substring of 91111(9D) from OCR)

Userlevel 1

Ola Liabøtrø
Practitioner
22 replies
1 year ago
5 August 2022

Is (9D) common throughout, or is it variuous other codes? I guess you are using the UI, since you are referencing the standard and advanced models, but also some sdk maybe, since you are able to get the ocr? I think we can adjust to cover this case without too much risk of creating false positives for other cases. Is the code that is used always 5 digits long also?

Userlevel 2

Kumar Varaganti Sharath
Author
Committed
13 replies
1 year ago
5 August 2022

The (9D) is not common across all the files. Please see the below screenshot of another Instance

I usally run from SDK for Contextualization . for this case i tried even from UI by changing the models and tokens as well.But this is not getting picked.

Userlevel 1

Ola Liabøtrø
Practitioner
22 replies
1 year ago
5 August 2022

Is there any way for you to know in advance what the trailing number will be? If there is a limited number of possibilities, an option is to add multiple versions of the name to the entity. E.g. you can transform

{id: 123, name: PI91111} into {id: 123, name: [PI91111, PI91111-9D, PI91111-6A… ]}

I would like to make it work anyway, but that may take some time, and the resulting box would also cover (9D) anyway, since we dont have control over exactly where inside the box the different letters are.

Userlevel 2

Kumar Varaganti Sharath
Author
Committed
13 replies
1 year ago
5 August 2022

Actually we deal with huge files of dataset(30k files). Its highly impossible to get the trailing numbers , there could be many different patterns like 9D,6D,3D,4W,11S…… Even initially i had a thought of adding versions of names for the entity. but the pattern number is huge.. it keeps on increasing while i tried to check many files.

And also while you create a ticket for the same. could you also assign the ticket to the below emails ,where we can track the status.

morten.nesvik@cognitedata.com
ben.petree@cognitedata.com
philippe.bettler@cognitedata.com