Quickstart Deployment Pack

Forum|Forum|9 hours ago
February 11, 2026
1 reply
20 views

Aashutosh Chaubey
Practitioner

Overview

The Quickstart Deployment Pack (DP) is a comprehensive solution designed to bootstrap a Cognite Data Fusion (CDF) project with a robust, production-ready foundation. It provides a curated set of modules that handle everything from infrastructure setup and data modeling to data ingestion, contextualization, and quality reporting.

This guide explains how to install, configure, and use the Quickstart DP to accelerate your industrial digital twin development.

The Quickstart DP consists of several integrated modules. Below is a detailed guide to each major component.

1. CDF Common Module

Purpose: Establishes the foundational infrastructure required by all other modules. It ensures consistency across environments by provisioning shared resources.

Key Components:

Data Spaces: Provisions dedicated spaces for data instances (demo.Space) and function execution (functions.Space).
Datasets: Creates the demo.DataSet used to organize transformations, functions, and workflows.
RAW Databases: Sets up source.DataBase for raw ingestion and contextualization_state.DataBase for tracking process states.
Direct Relation Writer Function: A specialized CDF Function that converts approved annotations into permanent direct relations in the data model.
Extraction Pipeline: Manages the configuration for the relation writer job, including mappings between file annotations and target entities.

2. QS Enterprise Data Model Module

Purpose: Provides a comprehensive enterprise data model that extends the standard Cognite Data Model (CDM) with organization-specific views and containers, offering complete coverage for process industry use cases including 3D visualization, work management, and asset hierarchy.

Key Components:

Containers: Define the physical storage for 38+ entity types including Assets, Equipment, TimeSeries, WorkOrders, CAD/PointCloud/360Image models, Maintenance Orders, Notifications, and more.
Views: Provide a queryable interface for all entity types, extending CDM base types (e.g., CogniteAsset, CogniteEquipment) with custom enterprise views (e.g., Asset, Equipment, Enterprise_TimeSeries).
Spaces: Three dedicated spaces for data organization:
- Schema Space (sp_enterprise_process_industry): Stores all data model definitions
- Enterprise Instance Space (sp_enterprise_instance): Stores enterprise-wide instance data
- Site Instance Space (sp_site_instance): Stores site-specific instance data

Quickstart Enterprise Data Model: A unified schema (ORGProcessIndustries) combining all CDM interfaces and custom enterprise views into a single queryable model.

Enterprise Search Data Model: An optimized search model (ORGProcessIndustriesSearch) with a subset of views tailored for search functionality across Assets, Equipment, TimeSeries, Files, and WorkOrders.

3. CDF Ingestion Module

Purpose: Orchestrates the entire data population and contextualization lifecycle through automated workflows.

Key Components:

Ingestion Workflow: A multi-phase orchestrator that runs transformations in the correct dependency order (for example, ensuring Assets exist before linking TimeSeries to them).
Auth Groups: Define specific security permissions for users and service accounts to execute automated runs safely.
Phase 1 (Population): Tasks that move data from RAW tables into the Data Model for entities like assets, equipment, and files.
Phase 2 (Contextualization): Tasks that create relationships between the populated entities.

4. Source System Modules (SAP & PI)

Purpose: Handle the initial ingestion of master data and time-series metadata into CDF.

SAP Assets Components:

Population Transformations: SQL logic that transforms SAP functional locations and equipment into the CDF Data Model.
Connection Transformation: Links equipment to their parent assets based on the SAP hierarchy.

PI (TimeSeries) Components:

Population Transformation: Converts PI tag metadata into TimeSeries instances and extracts asset tags from naming conventions into the sysTagsFound property.
Extraction Pipeline: Manages the configuration for PI data extractors.

5. CDF Connection SQL Module

Purpose: Automates relationship creation using declarative SQL logic based on tag matching.

Key Components:

TimeSeries to Asset/Equipment: SQL transformations that match sysTagsFound on TimeSeries to the names of Assets or Equipment.
Maintenance Order to Asset: Links work orders to physical assets by scanning order metadata for asset tags.
Activity to TimeSeries: Connects work activities to specific data streams via pattern matching.

6. CDF File Annotation Module

Purpose: An advanced framework for automating the identification and linking of entities within documents (such as P&IDs).

Key Components:

Prepare Function: Identifies new or reset files that require annotation.
Launch Function: Groups files by site and submits jobs to the Cognite Diagram Detect API.
Finalize Function: Processes API results, applies confidence thresholds, and creates edges in the data model.
Promote Function: Automatically resolves “pattern-mode” annotations (for example, text matches) by finding the correct entity in the data model.
Link Assets: Transformation to directly link the files with annotated assets.
Annotation Workflow: Orchestrates the sequence of the four functions above.

7. CDF Entity Matching Module

Purpose: Provides AI-powered and rule-based matching to link TimeSeries data to Assets.

Key Components:

Entity Matching Function: Combines regex rules, expert manual mappings, and machine learning to associate sensors with equipment.
Metadata Update Function: Optimizes and enriches entity metadata (such as NORSOK discipline classification) to improve searchability.
Annotation Workflow: Triggers matching and metadata updates as new data arrives.

8. Open Industrial Data (OID) Sync Module

Purpose: Simulates real-time industrial data by fetching historical data from a public project and time-shifting it to the present.

Key Components:

OID Sync Function: Fetches historical data, applies a time offset (for example, +1 week), and inserts it into the target project.
Smart Backfill Strategy: A logic gate that ensures all TimeSeries get real-time updates while progressively backfilling 12 weeks of history.
Scheduled Trigger: Automatically runs the sync every 10 minutes.

9. CDF Quality Reports Module

Purpose: Provides automated monitoring and governance of the contextualization process.

Key Components:

Quality Transformations: A suite of SQL scripts that calculate “link rates” (the percentage of entities successfully connected) across various types.
Contextualization Rate Workflow: Runs all quality reports in sequence to generate a unified snapshot of data health.
Report Table: A dedicated RAW table that stores historical metrics for trend analysis and gap identification.

High Level overview of Quickstart Deployment Pack

Prerequisites

Before you start, ensure you have the following:

You already have a Cognite Toolkit project set up locally.
Your project contains the standard cdf.toml file.
You have valid authentication to your target CDF environment.
cognite-toolkit >= 0.7.33.
The data plugin enabled in your cdf.toml file.

Getting Started

Follow the steps below to get started with the Quickstart Deployment Pack. Recommended to start with a new project or clean project.

Step 1: Enable External Libraries

Edit your project’s cdf.toml and add:

[alpha_flags]
external-libraries = true
[library.cognite]
url = "https://github.com/cognitedata/library/releases/download/latest/packages.zip"
checksum = "sha256:f73006968f1fc6d7e02cd7021b11ef67d5a682c0849268165268d5572e7fefee"

This allows the Toolkit to retrieve official library packages, including the Quickstart DP package.

📝 Note: Replacing the Default Library
By default, a Cognite Toolkit project contains a [library.toolkit-data] section pointing to https://github.com/cognitedata/toolkit-data/....
These two library sections cannot coexist. To use this Deployment Pack, you must replace the toolkit-data section with library.cognite:
[library.toolkit-data] [library.cognite]
github.com/cognitedata/toolkit-data/... github.com/cognitedata/library/...
The library.cognite package includes all Deployment Packs developed by the Value Delivery Accelerator team (RMDM, RCA agents, Context Quality Dashboard, etc.).

⚠️ Checksum Warning
When running cdf modules add, you may see a warning like:
WARNING [HIGH]: The provided checksum sha256:... does not match downloaded file hash sha256:...
Please verify the checksum with the source and update cdf.toml if needed.
This may indicate that the package content has changed.
This is expected behavior. The checksum in this documentation may be outdated because it gets updated with every release. The package will still download successfully despite the warning.
To resolve the warning: Copy the new checksum value shown in the warning message and update your cdf.toml with it. For example, if the warning shows sha256:da2b33d60c66700f..., update your config to:
[library.cognite]
url = "https://github.com/cognitedata/library/releases/download/latest/packages.zip"
checksum = "sha256:da2b33d60c66700f..."

Step 2: Add the Module

Run:

cdf modules init . --clean

⚠️ Disclaimer: This command will overwrite existing modules. Commit changes before running, or use a fresh directory.
This opens the interactive module selection interface.

Step 3: Select the Quickstart Deployment Pack

From the menu, select:

Quickstart Deployment Pack: Quickstart deployment pack for CDF help you showcase CDF capabilities in a quick and easy way.

Step 4: Verify Folder Structure

After installation, your project should contain:

     modules                               
     ├── accelerators                      
     │   ├── cdf_common                    
     │   ├── cdf_ingestion                 
     │   ├── contextualization             
     │   │   ├── cdf_file_annotation       
     │   │   ├── cdf_entity_matching       
     │   │   └── cdf_connection_sql        
     │   ├── industrial_tools              
     │   │   └── cdf_search                
     │   └── open_industrial_data_sync     
     ├── sourcesystem                      
     │   ├── cdf_pi                        
     │   ├── cdf_sap_assets                
     │   ├── cdf_sap_events                
     │   └── cdf_sharepoint                
     ├── dashboards                        
     │   └── rpt_quality                   
     └── models                            
         └── qs_enterprise_dm

If you want to add more modules, continue with yes (Y); otherwise choose no (N).

Proceed with creation (Y) to generate the folder structure and files for your selected modules.

Step 5: Update the Configuration File

Once the files are available, update the configuration files config.<env>.yaml for any variables that are not set. Environment variables must also be updated with Client ID and Client Secret values for different data sources such as Aveva PI, SAP, etc., which can be found under the sourcesystem module’s variable declarations. As of for now the Client ID and Client Secret values for these sources are using IDP_CLIENT_ID and IDP_CLIENT_SECRET.

Update the following variables in the configuration file:

<my-project-env> Replace with your CDF project name for that environment.
<GROUP_SOURCE_ID> Replace with your designated Group Object ID. For testing, you can use your TK service principal group’s SourceId, but this is not recommended for production.
<RUN_WORKFLOW_USER_ID> Replace with your email address or a user who can trigger workflows.
OPEN_ID_CLIENT_SECRET You can get this from the hub page on Open Industrial Data (OID).
Update following variable under the cdf_entity_matching module:
- AssetFilterValues from root:WMT to root:ast_VAL
- AssetSearchProperty from name to aliases

⚠️ Note: It is strongly recommended to keep Client IDs and Client Secrets as environment variables rather than in your configuration files. By default, these variables are read from the .env file and will cause build or deploy errors if missing.

Step 6: Enable File Annotation mode in Asset Transformation

In transformation file asset.Transformation.sql which can be found under modules > sourcesystem > cdf_sap_assets > transformations > population, there are two modes - COMMON MODE and FILE_ANNOTATION MODE. Uncomment the FILE_ANNOTATION MODE and comment out the COMMON MODE. The FILE_ANNOTATION MODE uses ast_<id> format for external IDs, creates a root asset ast_VAL (representing the Valhall platform), and sets the top-level asset to have ast_VAL as its parent while other assets reference their ancestor via WMT_TAG_ID_ANCESTOR. This mode also populates the aliases property with name variations and the tags property with 'DetectInDiagrams' for assets having two or more dashes in their name, enabling diagram detection matching.

Testing the Quickstart Package

The Quickstart Deployment Pack includes test data in the sourcesystem module. Using this data, you can quickly deploy and test the Quickstart modules. The result will be P&ID files being annotated, and contextualization rates can be checked in the deployed Streamlit app.

Update the creator field in modules/accelerators/contextualization/cdf_file_annotation/streamlit/file_annotation_dashboard.Streamlit.yaml with your Email ID.
Deploy to CDF:
- Populate variables in resource YAML files from the configuration file by running cdf build. If warnings appear, resolve them and rebuild using the same command.
  
  ⚠️ Note: Ignore warnings such as WARNING [LOW]: Module 'cdf_pi' has non-resource directories: ['upload_data']...
- Optionally dry-run the deployment using cdf deploy --dry-run to validate everything before deploying to your CDF instance.
- Deploy to your CDF instance using cdf deploy.
Uploading data through cdf deploy in Toolkit will be deprecated in version 0.8 and later. The new Deployment Pack supports this change, and all synthetic data is stored under the upload_data directory in the respective modules. The new way to upload data is using the data plugin’s upload command.
- In your cdf.toml file, verify that the data plugin is enabled:
```
[plugins]
data = true
```
- Run the following commands to upload the synthetic data:
```
cdf data upload dir modules/sourcesystem/cdf_pi/upload_data
cdf data upload dir modules/sourcesystem/cdf_sap_assets/upload_data
cdf data upload dir modules/sourcesystem/cdf_sap_events/upload_data
cdf data upload dir modules/sourcesystem/cdf_sharepoint/upload_data
cdf data upload dir modules/accelerators/contextualization/cdf_entity_matching/upload_data
cdf data upload dir modules/accelerators/contextualization/cdf_file_annotation/upload_data
```
  Verify the data upload in Integrate > Staging in CDF.
  ⚠️ Note:
  1. These upload_data directories contain Manifest.yaml files with hardcoded table and database names. If you change table or database names in the configuration file, update the corresponding Manifest.yaml files as well.
  2. If you are maintaining your modules under the organization directory, then add the organization directory name to the start of the upload_data directory path.
Once data is uploaded, trigger the following workflow files in this order from the Data Workflows UI in CDF:
- ingestion to populate the data model and build connections between assets, equipment, work orders, etc.
- wf_file_annotation to annotate uploaded files. Test files are located in the sourcesystem/cdf_sharepoint/files/ directory and can be viewed in your Industrial Tools Search App.
- EntityMatching to run the entity matching workflow. Results can be found in the dm:context:timeseries:entity_matching function’s run logs.
- wf_contextualization_rate to check file annotation. This populates tables in the db_file_annotation database. You can review annotation coverage in the files within the Search App (Industrial Tools) or in the File Annotation Dashboard under Custom Apps (Industrial Tools).

Support

For troubleshooting or deployment issues:

Refer to the Cognite Documentation
Contact your Cognite support team
For any queries related to the deployment pack, write to us on our slack channel #topic-deployment-packs.

Bartosz Czernia
Practitioner
Forum|Forum|1 hour ago
February 11, 2026

Just a suggestion, I would make the image at the start of the article bigger and centered. Right now the image is illegible and you can’t see what its demonstrating.

`[library.toolkit-data]`	`[library.cognite]`
`github.com/cognitedata/toolkit-data/...`	`github.com/cognitedata/library/...`

Overview

1. CDF Common Module

Key Components:

2. QS Enterprise Data Model Module

Key Components:

3. CDF Ingestion Module

Key Components:

4. Source System Modules (SAP & PI)

SAP Assets Components:

PI (TimeSeries) Components:

5. CDF Connection SQL Module

Key Components:

6. CDF File Annotation Module

Key Components:

7. CDF Entity Matching Module

Key Components:

8. Open Industrial Data (OID) Sync Module

Key Components:

9. CDF Quality Reports Module

Key Components:

Prerequisites

Getting Started

Step 1: Enable External Libraries

Step 2: Add the Module

Step 3: Select the Quickstart Deployment Pack

Step 4: Verify Folder Structure

Step 5: Update the Configuration File

Step 6: Enable File Annotation mode in Asset Transformation

Testing the Quickstart Package

Support

Sign up

Welcome to Cognite Hub

Scanning file for viruses.

This file cannot be downloaded