P&ID Annotation Workflow using CDF Data Modeling [Cognite Official]

Forum|Forum|9 months ago
March 3, 2025
0 replies
747 views

Jan Inge Bergseth
MVP

This How-to article describes and provides a structured example template for automating the P&ID annotation process in Cognite Data Fusion (CDF). The process leverages CDF Data Modeling and Workflows to automate annotation, linking P&ID documents to assets and other related files within the data model.

Overview of the Workflow

The workflow consists of two automated functions scheduled and executed using CDF Workflows:

Metadata & Alias Processing: Updates or creates metadata and aliases for files and assets.
P&ID Annotation Processing: Uses the generated aliases to annotate P&ID documents automatically.

The final result is populated annotations in the data model, linking P&ID files to assets and interrelated P&ID diagrams, as illustrated below:

Key Features of the Workflow

1. Integrated Extraction Pipeline

Both functions are connected to a dedicated extraction pipeline.
The pipeline stores overall documentation, configuration, and maintains logging & notifications.

2. Metadata and Alias Processing

AI/LLM-generated metadata summaries: If no description exists, a summary is generated.
Tagging of documents: If processing diagrams are found, the tag PID is automatically added.
Alias generation for filenames: Since full file names contain versions and revisions that are often absent in P&ID references, alias variations are created to improve matching.
Alias generation for assets/tags: System numbers are removed to enhance precision in asset matching.
State storage: A RAW table stores state information, preventing reprocessing of already annotated files.

Note: Naming conventions should be adjusted based on project-specific standards.

Executing the P&ID Annotation Process

The workflow configuration aligns with the data modeling structure in CDF, including:

Instance spaces: Where data is stored.
Schema spaces: Definitions for the data model.
External ID for views/types: For extended Cognite Asset types.
Search properties: Defines how matching is performed.
Filtering properties: List of possible values (e.g., tag lists).
DEBUG mode: Enables detailed logging.

Delete & Cleanup Functionality

If thresholds for automatic annotation approvals are changed, previous annotations should be deleted.
The process only removes annotations generated by this workflow, leaving manual annotations intact.
Without deletion, existing external IDs prevent duplicate annotations.
State store for incremental support ensures that only new/updated files are processed.

To clean previous annotations, set: cleanOldAnnotations = True in configuration.

Annotation Process

Filters can be applied to identify relevant Files and Assets.
Search properties are used to match P&IDs to corresponding files/assets.
If a batch processing error occurs:
- Retry up to 3 times.
- If failures persist, switch to individual file processing.
- If individual processing fails, log the error and skip the file.
Matches are stored in a RAW table for documentation.
A threshold-based system determines whether annotations are automatically approved or suggested.
Annotations are created using CDF Data Modeling (DM) service.
Log status updates are stored in the extraction pipeline log.

Output & Visualization

The workflow generates annotations that are displayed on P&ID diagrams as:
- Purple boxes – Linking to assets.
- Orange boxes – Linking to files.
These annotations enhance data contextualization and improve traceability between P&IDs and related assets.

How to Use the Provided Code Example

The provided example is structured as a CDF Toolkit module for governing and deploying the annotation workflow within a CDF project. The content of the toolkit module looks like this:

AD_4nXcPbwuBlBcctR4tKlFqNThC-NjzORI9eG-jIsG2M0UmqD660TraIwnZHKho23e9OivH_oAHS2Unm1jRbRgLaFrIXbYk9fK6px2hd8jlNqpemVrQ7RMs_d52bmUkq6H9645ZYMHH?key=BP5dpwLNDUP5C90sHdGkhK-v

In the README.md file you will find a description of each of the resources in the
module.

Setup Instructions:

Ensure CDF Toolkit is set up: Follow the guide here.
Download the module code from GitHub:
- Repository: GitHub link
- Module name: cdf_p_and_id_annotation
Step 1: Enable External Libraries

Edit your project's cdf.toml and add:
```
[alpha_flags] 
external-libraries = true 
[library.cognite] 
url = "https://github.com/cognitedata/library/releases/download/latest/packages.zip" 
checksum = "sha256:795a1d303af6994cff10656057238e7634ebbe1cac1a5962a5c654038a88b078" 
```
This allows the Toolkit to retrieve official library packages.

To help improve the Deployment Pack:
```
cdf collect opt-in 
```
Run:
```
cdf modules init . --clean 
```
This opens the interactive module selection interface.

⚠️ Disclaimer: This command will overwrite existing modules. Commit changes before running, or use a fresh directory.
Integrate the module into your CDF project:
- Copy the module into your CDF Toolkit-based structure.
- Update the default.config.yaml file with project-specific configurations.
- Add the module name to the list of selected modules in your project configuration file.

AD_4nXdP2I26xrWKQOaAczYt0m20uaFP4jcTB3THMwrmHQA06PHMKvm2_qxVCvLbOtal-GWZH7pY9ktcPI0eVH5Z0HvQq41U-MvYWFs-vMgDSUxjTkiStLjaJlgcNiesRekamCca1y88dA?key=BP5dpwLNDUP5C90sHdGkhK-v

AD_4nXfTCz_jctGZ_f6Lgq-xeioGcIatxrCNCk-TzbrIcJMTn0NBqVsXA5ZR5BmoJ6LX1SjsUghlQPESYoHVynGyjLDfmMrf5dau_VrpWBOSleoiAlDiWsbBtZc9PNAH3kmYsc_CjaoD?key=BP5dpwLNDUP5C90sHdGkhK-v

Deploy the module using CDF Toolkit:

Run build & deploy in the Toolkit to ensure a successful deployment.

After deployment, you should have:

In Staging (RAW Database & Tables):

AD_4nXdbTOcx0e3LTHeiN1xv6ronhxBNlpo8imYNee-185BvxOK_MJQIg66PwHXTorCVvB-1vWoXhNjE0tTlFt1171vKs_ihxSt_nZIj56X4IkY18T0guaz-v3k9glDaNM8T47q0ioL1?key=BP5dpwLNDUP5C90sHdGkhK-v

In Extraction Pipelines:

AD_4nXdpjehmMBzYSiiRb6VCMniS67bk5PzcnwWUNmCUXP-vj0p8z3z-myfiXjNQMEMXFr77Rk79GY_i6rrzYaNxQXvVpy_OSTQso7AeFd5WUvU9R-QipEGfgggNit-V2OKEhKWS8Gm1?key=BP5dpwLNDUP5C90sHdGkhK-v

In Functions:

AD_4nXfHkhmDQdjfO5KSv-QPVFHjJG1FqSGsisA2Wg4bGihtQbUM9OCgWcs4_kjXAaErzHepI-MRtutFr3Hg_i0zhR6Zca-fuSBl4CFu4qD9FcWKXDkNFWL7zNQHSKqAHWG8JSQOGgIE?key=BP5dpwLNDUP5C90sHdGkhK-v

In Data Workflows:

AD_4nXeuBb6aKzjBqdsPGrLd24svj7PSLv6j4IEo7KZXUNixZIkar-EIkIltNqOOgXqHK91H1BxwIma4qPpUdmiyGlGNTEPvx2motQPqsERojCOEOoagGu7_f94ldweLN9R81hSxBttUMg?key=BP5dpwLNDUP5C90sHdGkhK-v

In Access Management Configurations:

AD_4nXcTHZuV1DyEiLk9MmBNMlLtfd5iIHVTbz0zWLJNQXVMNUTZUu1fQVC_xws6syAElUsIkl_G6PTzpE-_hnIj9rWmfL8IJfUGhizrdSDhYgrQYCItm9V_92e-9gFtrgrVSuPVdFLJ?key=BP5dpwLNDUP5C90sHdGkhK-v

Now that you have a working deployment, proceed to modify the module’s configurations and scripts to align with your project requirements.

Modify the module’s default configuration to fit your project

In your CDF toolkit configuration yaml file, update:
- location_name: replace LOC with the location name / Asset name that you are working on
- source_name: replace SOURCE with the source where your P&ID files are extracted from
Update the folder structure:

Rename files replacing LOC and SOURCE where applicable.

AD_4nXfbKg3RWl2Q36P7iP0coAH3p-d7v5DtkYSZy753qsuE6fW7FEv4BzLeBRckAoc8te8j4o-haBj25k__GLl4duUIvMRxfVkFRtCi3KchNPcCfOAdh3OVGoPupoK-Wv9kyj444UM?key=BP5dpwLNDUP5C90sHdGkhK-v

Modify the alias functions in the module if necessary:
- Update get_file_alias_list
- Update get_asset_alias_list

Use the Toolkit to build & deploy to test your modifications. If you want to test locally before deploying this is also possible and described in the module README.md file

Testing & Validation in CDF

Once deployed, trigger the workflow run to validate the P&ID annotation process:

Use CDF Search (in Industrial Tools) to verify that annotations are correctly applied.
Confirm that P&ID assets are correctly linked within the data model.

Next Level

With the P&ID Workflow you have an important building block in creating a workflow or process around the annotation process. In the illustration below you now have the processes for:

Process for Alias creations
Reading of your sources to be annotated
Diagram detection or annotation of the P&ID
The output related to mapped Documents & Assets in RAW tables

Then utilizing this in your custom application or process with P&ID Edit functionality would provide high quality annotations that can be used in multiple settings/applications /use cases.

AD_4nXdhRL_P31wRKUaGBM3R6G7MID6A1bCDOzzd4M4Y0GZFsO9uwLbZPfP22rrErYTzWsNuslPVKCQoIHi1L7AvQSaJlSXHoLJLt67jiecccpcM8t-gQp_Cg0_mBRj4qEVH9sI5lDBomA?key=BP5dpwLNDUP5C90sHdGkhK-v

Overview of the Workflow

Key Features of the Workflow

Executing the P&ID Annotation Process

Setup Instructions:

Sign up

Welcome to Cognite Hub

Scanning file for viruses.

This file cannot be downloaded