Data Model-Centric Annotation Template for P&IDs and Multi-Page Documents

Forum|Forum|7 months ago
July 28, 2025
1 reply
99 views

+1

Jack Zhao
Practitioner

Automating the annotation of critical industrial documents, such as Piping and Instrumentation Diagrams (P&IDs), is essential for effective data contextualization. To make this process more robust and scalable, we present a template created for annotating files within Cognite Data Fusion (CDF). This template is currently being used by some of our biggest projects and has cut the annotation time from 16 days to less than half a day when processing ~66,500 files. It is also being adopted by smaller, Quickstart projects with no code change.

This template uses a data model-centric approach that is designed to handle the evolving needs of complex projects. It provides a standardized and flexible way to manage the entire annotation lifecycle, from file selection to final reporting.

For the complete code, detailed deployment steps (including a video walkthrough), and advanced guides, please visit the official repository:

https://github.com/cognitedata/library/tree/main/modules/contextualization/cdf_file_annotation

Key Features

This annotation template is built to be both powerful out-of-the-box and easy to customize:

Configuration-Driven: The entire workflow is controlled by a single config.yaml file, allowing you to adapt to different data models and requirements without changing any code.
Large Document Support: Automatically handles files larger than 50 pages by breaking them into smaller chunks and processing them iteratively.
Ready for Parallel Execution: A robust optimistic locking mechanism prevents race conditions, ensuring stability when running multiple functions concurrently.
Detailed Reporting and Auditing: All processed annotation details are stored in CDF RAW tables, function logs, extraction pipeline runs, and a streamlit dashboard that comes included with the template, providing clear audit trails.
Local Development and Debugging: The template includes a pre-configured setup for easy local testing and debugging within VS Code.

How to Set Up the Template

Getting the template running in your CDF project is a streamlined process using Cognite Toolkit.

Integrate the Module: Add the module to your project and update the config.yaml file with your project-specific details, such as data model views and target entities.
Create an Environment File: Create a .env file in the root directory to hold your CDF project credentials and connection information.
Build and Deploy: Use the Cognite Toolkit to build and deploy the module to your CDF environment.

How to Use the Template

Once deployed, the template can be used in two primary ways: as an automated workflow in CDF or run locally for development and debugging.

Automated Workflow Execution: After deployment, the annotation process is managed by a workflow in CDF that orchestrates the Launch and Finalize functions. This workflow is automatically triggered based on the schedule defined in the Toolkit’s configuration file, continuously processing new files as they arrive. You can monitor the progress and logs of the functions in the annotation pipeline dashboard that comes with the template or in the function logs.
Local Development and Debugging: The template is configured for easy local execution directly within Visual Studio Code. By using the pre-configured launch.json file, you can run and debug both the Launch and Finalize functions on your local machine. This allows you to set breakpoints, inspect variables, and test your configuration before deploying.

How the Workflow Operates

The template orchestrates the annotation process through a workflow that consists of three main phases:

Prepare: This initial phase identifies new files that need to be annotated. It queries for files tagged for processing and creates a corresponding AnnotationState instance in the data model to track its journey.
Launch: The launch function queries for all files ready for processing. It efficiently groups them, fetches the necessary context from the data model, and calls the Cognite Diagram Detect API to begin the annotation job.
Finalize: Once an annotation job is complete, the finalize function retrieves the results, applies the new annotations to the file, and updates the file's status to "Annotated" or "Failed". A summary report is then written to a CDF RAW table.

Built for Scale and Customization

This template was designed with two core principles in mind: addressing evolving project needs and balancing simple configuration with deep customization.

For most use cases, editing the config.yaml file is all you need to get started. However, when projects demand unique logic or performance optimizations, its interface-based architecture provides an "escape hatch". Developers can implement their own custom Python classes for specialized logic without altering the core template code.

This stateful, data model-driven approach ensures high reliability and performance, using built-in optimistic locking for concurrency and indexed queries to efficiently find files that need processing.

Getting Started and Diving Deeper

The repository's README.md file provides a complete step-by-step guide to get you up and running.

To understand the full capabilities of this template, the repository includes several in-depth guides in the cdf_file_annotation/detailed_guides/ directory:

CONFIG.md: A detailed outline of the configuration file.
CONFIG_PATTERNS.md: Recipes for common operational tasks and performance tuning.
DEVELOPING.md: A guide for developers who wish to extend the template's functionality.

We highly encourage anyone who’s interested to explore the repository, read the detailed guides, and deploy the template in your CDF project.

+13

Sofie Haug
Cognite Academy Instructor
Forum|Forum|7 months ago
July 28, 2025

This is a fantastic and practical solution to a widespread problem. The efficiency gains are impressive. Well done!

Sofie | Cognite Academy

Like

Key Features

How to Set Up the Template

How to Use the Template

How the Workflow Operates

Built for Scale and Customization

Getting Started and Diving Deeper

Sign up

Welcome to Cognite Hub

Scanning file for viruses.

This file cannot be downloaded