Skip to main content
Solved

Issue on Equipment's to P&ID file Contextualization

  • 4 August 2022
  • 9 replies
  • 124 views

We have an equipment with name PI91111 on the P&ID and on the other hand we have this equipment in the asset Hierarchy.

PNID Diagarm

 

This equipment is not getting contextualized for this P&ID.

I have used standard model and advanced model with min tokens 1 & 2 as well.

The ocr output for this is below:

 

You should get the desired detections now.


https://cognitedata.atlassian.net/browse/CXT-728 (Forgot to paste in)


I created this task, I dont think I can assign multiple people. Could optionally tag them though. 


Actually we deal with huge files of dataset(30k files). Its highly impossible to get the trailing numbers , there could be many different patterns like 9D,6D,3D,4W,11S…… Even initially i had a thought of adding versions of names for the entity. but the pattern number is huge.. it keeps on increasing while i tried to check many files.

And also while you create a ticket for  the same. could you also assign the ticket to the below emails ,where we can track the status.

morten.nesvik@cognitedata.com
ben.petree@cognitedata.com
philippe.bettler@cognitedata.com


Is there any way for you to know in advance what the trailing number will be? If there is a limited number of possibilities, an option is to add multiple versions of the name to the entity. E.g. you can transform 

{id: 123, name: PI91111} into {id: 123, name: PI91111, PI91111-9D, PI91111-6A… ]} 

I would like to make it work anyway, but that may take some time, and the resulting box would also cover (9D) anyway, since we dont have control over exactly where inside the box the different letters are. 

 

 


The (9D) is not common across all the files. Please see the below screenshot of another Instance 

I usally run from SDK for Contextualization . for this case i tried even from UI by changing the models and tokens as well.But this is not getting picked.


Is (9D) common throughout, or is it variuous other codes? I guess you are using the UI, since you are referencing the standard and advanced models, but also some sdk maybe, since you are able to get the ocr? I think we can adjust to cover this case without too much risk of creating false positives for other cases. Is the code that is used always 5 digits long also?


yes there are 400-500 instances like these...where they are not getting picked. It should be the combination of PI and 91111 (which is substring of 91111(9D) from OCR)

 


Hi! The algorithm is a bit hesitant to cherry-pick tokens from an ocr result, and this is what happens here. If you would look for PI-91111-9D, you would find it. Are there many instances like this in this project? 
 


Reply