I also just now learned that you can send HEAD requests to the download urls, like this:def get_content_lengths(client, external_ids): urls = client.files.retrieve_download_urls(external_id=external_ids) return { external_id: client.files._http_client.request("HEAD", urls[external_id]).headers["Content-Length"] for external_id in external_ids }This might be a workaround if the files for some reason are not available in the documents API. If they are in the documents API, this should require the fewest requests.
Hi!I cant see that the API has a way to get the file size without downloading the file. It seems like a very useful feature to have though. If this is a one time cleanup job, it would maybe be ok, though time consuming to do len(client.files.download_bytes(file_id=file_id)) for all the file ids?
There is the cognite sdk documentation: https://cognite-sdk-python.readthedocs-hosted.com/en/latest/contextualization.html#engineering-diagramsAnd the API docs: https://api-docs.cognite.com/20230101-beta/tag/Engineering-diagrams/operation/diagramDetect/ But it could be more thorough on how to use pattern_mode. You need to use api_subversion=”beta” when creating your client_config, e.g. creds = OAuthInteractive( client_id=client_id, scopes=[f"https://{cluster}.cognitedata.com/.default"], authority_url=authority_uri)client_config = ClientConfig( client_name="my-special-client", base_url=f"https://{cluster}.cognitedata.com", project=project, credentials=creds, api_subversion="beta",)client = CogniteClient(client_config)You would do something like entities = [{"name": "CBP-BIQ-SBBQ-01", "id": 123}, ...]file_external_id = "something"detect_job = client.diagrams.detect( file_references=[ FileReference(file_external_id=file_external_id, first_page=1, last_pag
Hi! Is the table in a separate file, or in the same file? At least if it is in the same file, one could consider to just detect the tags in the table. Just being able to navigate by clicking the numbers would be a plus, but especially since the tags seem to be very similar, I would think that a navigating user would be just as happy to see the table (with some extra information) as seeing the drawing in this case. E.g. how do they decide if they want to check out 54 rather than 55 here? That said, an attempt to do it could proceed as follows: Run regular diagram detect to detect known tags For each detected tag named ABC123 with id X, produce the sample “00 [ABC123]”, also add a sample “00”. Run diagram detect in beta with pattern_mode=True, min_tokens=1 and entities=[{“sample”:sample, “id”: X}] The detections will now have names like “60 T-00XX”, but also “60”, “51” etc. Finally, for each detection of a single two-digit number “XX” with a corresponding pattern detection “XX <TAG
OCR endpoint in V1 has been in our backlog for some time, but we still have higher priorities. We will likely provide a v1 endpoint with a beta header this fall though. The first version will not trigger OCR jobs on files that have not yet been parsed. The plan is to add this eventually. If you would like to contribute and add it to your favorite SDK once the API is up that would be very welcome!
Hi! Could you add some more context? Files can be linked to multiple assets via the assetIds field, or there could be several annotations in the document that links to different assets.
Hi! I have added your feedback to our backlog. Reviewing results manually might not scale so well to begin with, so it is important to be able to pick it up later, or distribute it. This might require some bigger structural changes eventually.I am interested in examples of false positives. It could be that we can get rid of more of them by improving the backend than e.g. by enabling selecting multiple annotations for rejection to speed up that process.
Hi!I dont think there is a very good way to do that. You could try to count the number of characters and assume they have ~the same width. If there are then M characters to the left of your new x_min and N characters to the right you would get new_x_min = (N*old_x_min + M*old_x_max) / (N+M), but you can see that the boxes are a bit different in padding around the text, which will be an additional source for error. What is the reason that you want to reduce the box? If the smaller box is what you get from the detect algorithm and you want to compare that with the xml boxes, I can see the point to some extent. For clickable functionality, I would think that having the whole box as a reference would not be a problem. The full text in this case contains both an id/line number for a line, and some metadata for the line. I.e. its diameter is 18 inches I believe. Logically, the whole text is referencing the line, even though you only need the last part to identify it.
Hi!The way to work with P&IDs going forward is to overlay them with annotations. An annotation denotes a region in a file and can link to e.g. assets. You can see documentation here: https://docs.cognite.com/api/v1/#tag/AnnotationsThe convert to image flow will be deprecated. https://docs.cognite.com/api/v1/#tag/Engineering-diagrams/operation/diagramDetect is the easiest way to detect references to assets, but the output should be written to the annotations API.The overlaying of annotations is handled by a unified file viewer component which is being rolled out to relevant applications. I am not sure which application is most ideal for the mentioned use cases, or if creating a custom app is the intention, but it should anyway be based on annotations.Hope that helps!
The usual way to handle these situations is to group together the entities that are related, typical case being timeseries mapped to a sensor asset, and link to the object representing the group. I understand that there are obstacles in the way for doing that, and I think we should look into those, for now, I will elaborate on why we want to group entities together, as well as what the current possibilities are. The way to get multiple links currently without having a collection object to go via is to create multiple annotations on top of each other. The UI works so that if you click on a spot with mutliple annotations, you can page through them and see all the linked resources. On why we want to group objects that belong together and then only point to the groups from the annotation: Entities that belong together (e.g. different versions of a file) should be linked together if they are referenced in a document or not. If the grouping of entities is done once, it can be reused later.
It has taken some time, but we now include comments. There is no flag, we include comments/annotations always. The old problem we had with certain pdfs vanished after updating the pdf renderer. Svgs generated later than today should include comments. It could take some time to detect entities in the comments for old files still. There is a cache issue that we are also working on.
This has come up a few times, and we have it in our backlog. Currently we have several other improvements with higher priority, but we will probably add it as an experimental feature eventually with a flag.
Hi!Thanks for reaching out about this. We are in fact explicitly removing annotations in the pdfs. We have had pdfs where rendering comments rather obscured things for the user than the other way around. This is the first time that we hear about it being a problem in a production setting. Although a few weeks ago, someone trying to make a demo P&ID had the same problem.I will create a task on it, most likely the solution will be to introduce a flag to include annotations.
https://cognitedata.atlassian.net/jira/software/c/projects/CXT/boards/471/backlog?view=detail&selectedIssue=CXT-729&issueLimit=100 This one might be what we want to do, but would be nice with some more details.
I understood as much, but it says that all detections in the document are false, so would be nice to have some examples of good detections too. E,g. 14WF30 looks like a good detection to me though.
Could you indicate some matches that would be correct? It would also be easier to help if the raw file and asset names were given.
You should get the desired detections now.
https://cognitedata.atlassian.net/browse/CXT-728 (Forgot to paste in)
I created this task, I dont think I can assign multiple people. Could optionally tag them though.
Is there any way for you to know in advance what the trailing number will be? If there is a limited number of possibilities, an option is to add multiple versions of the name to the entity. E.g. you can transform {id: 123, name: PI91111} into {id: 123, name: [PI91111, PI91111-9D, PI91111-6A… ]} I would like to make it work anyway, but that may take some time, and the resulting box would also cover (9D) anyway, since we dont have control over exactly where inside the box the different letters are.
Is (9D) common throughout, or is it variuous other codes? I guess you are using the UI, since you are referencing the standard and advanced models, but also some sdk maybe, since you are able to get the ocr? I think we can adjust to cover this case without too much risk of creating false positives for other cases. Is the code that is used always 5 digits long also?
Hi! The algorithm is a bit hesitant to cherry-pick tokens from an ocr result, and this is what happens here. If you would look for PI-91111-9D, you would find it. Are there many instances like this in this project?
Already have an account? Login
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.
Sorry, we're still checking this file's contents to make sure it's safe to download. Please try again in a few minutes.
Sorry, our virus scanner detected that this file isn't safe to download.