This How-to article describes and provides a structured example template for automating the P&ID annotation process in Cognite Data Fusion (CDF). The process leverages CDF Data Modeling and Workflows to automate annotation, linking P&ID documents to assets and other related files within the data model.
Overview of the Workflow
The workflow consists of two automated functions scheduled and executed using CDF Workflows:
- Metadata & Alias Processing: Updates or creates metadata and aliases for files and assets.
- P&ID Annotation Processing: Uses the generated aliases to annotate P&ID documents automatically.
The final result is populated annotations in the data model, linking P&ID files to assets and interrelated P&ID diagrams, as illustrated below:
Key Features of the Workflow
1. Integrated Extraction Pipeline
- Both functions are connected to a dedicated extraction pipeline.
- The pipeline stores overall documentation, configuration, and maintains logging & notifications.
2. Metadata and Alias Processing
- AI/LLM-generated metadata summaries: If no description exists, a summary is generated.
- Tagging of documents: If processing diagrams are found, the tag PID is automatically added.
- Alias generation for filenames: Since full file names contain versions and revisions that are often absent in P&ID references, alias variations are created to improve matching.
- Alias generation for assets/tags: System numbers are removed to enhance precision in asset matching.
- State storage: A RAW table stores state information, preventing reprocessing of already annotated files.
Note: Naming conventions should be adjusted based on project-specific standards.
Executing the P&ID Annotation Process
The workflow configuration aligns with the data modeling structure in CDF, including:
- Instance spaces: Where data is stored.
- Schema spaces: Definitions for the data model.
- External ID for views/types: For extended Cognite Asset types.
- Search properties: Defines how matching is performed.
- Filtering properties: List of possible values (e.g., tag lists).
- DEBUG mode: Enables detailed logging.
Delete & Cleanup Functionality
- If thresholds for automatic annotation approvals are changed, previous annotations should be deleted.
- The process only removes annotations generated by this workflow, leaving manual annotations intact.
- Without deletion, existing external IDs prevent duplicate annotations.
- State store for incremental support ensures that only new/updated files are processed.
To clean previous annotations, set: cleanOldAnnotations = True in configuration.
Annotation Process
- Filters can be applied to identify relevant Files and Assets.
- Search properties are used to match P&IDs to corresponding files/assets.
- If a batch processing error occurs:
- Retry up to 3 times.
- If failures persist, switch to individual file processing.
- If individual processing fails, log the error and skip the file.
- Matches are stored in a RAW table for documentation.
- A threshold-based system determines whether annotations are automatically approved or suggested.
- Annotations are created using CDF Data Modeling (DM) service.
- Log status updates are stored in the extraction pipeline log.
Output & Visualization
- The workflow generates annotations that are displayed on P&ID diagrams as:
- Purple boxes – Linking to assets.
- Orange boxes – Linking to files.
- These annotations enhance data contextualization and improve traceability between P&IDs and related assets.
How to Use the Provided Code Example
The provided example is structured as a CDF Toolkit module for governing and deploying the annotation workflow within a CDF project. The content of the toolkit module looks like this:
In the README.md file you will find a description of each of the resources in the
module.
Setup Instructions:
- Ensure CDF Toolkit is set up: Follow the guide here.
- Download the module code from GitHub:
- Repository: GitHub link
- Module name: cdf_p_and_id_annotation
- Integrate the module into your CDF project:
- Copy the module into your CDF Toolkit-based structure.
- Update the default.config.yaml file with project-specific configurations.
- Add the module name to the list of selected modules in your project configuration file.
- Deploy the module using CDF Toolkit:
Run build & deploy in the Toolkit to ensure a successful deployment.
After deployment, you should have:
In Staging (RAW Database & Tables):
In Extraction Pipelines:
In Functions:
In Data Workflows:
In Access Management Configurations:
Now that you have a working deployment, proceed to modify the module’s configurations and scripts to align with your project requirements.
Modify the module’s default configuration to fit your project
- In your CDF toolkit configuration yaml file, update:
- location_name: replace LOC with the location name / Asset name that you are working on
- source_name: replace SOURCE with the source where your P&ID files are extracted from
- Update the folder structure:
Rename files replacing LOC and SOURCE where applicable.
- Modify the alias functions in the module if necessary:
- Update get_file_alias_list
- Update get_asset_alias_list
Use the Toolkit to build & deploy to test your modifications. If you want to test locally before deploying this is also possible and described in the module README.md file
Testing & Validation in CDF
Once deployed, trigger the workflow run to validate the P&ID annotation process:
- Use CDF Search (in Industrial Tools) to verify that annotations are correctly applied.
- Confirm that P&ID assets are correctly linked within the data model.
Next Level
With the P&ID Workflow you have an important building block in creating a workflow or process around the annotation process. In the illustration below you now have the processes for:
- Process for Alias creations
- Reading of your sources to be annotated
- Diagram detection or annotation of the P&ID
- The output related to mapped Documents & Assets in RAW tables
Then utilizing this in your custom application or process with P&ID Edit functionality would provide high quality annotations that can be used in multiple settings/applications /use cases.