Overview
The Quickstart Deployment Pack (DP) is a comprehensive solution designed to bootstrap a Cognite Data Fusion (CDF) project with a robust, production-ready foundation. It provides a curated set of modules that handle everything from infrastructure setup and data modeling to data ingestion, contextualization, and quality reporting.
This guide explains how to install, configure, and use the Quickstart DP to accelerate your industrial digital twin development.
The Quickstart DP consists of several integrated modules. Below is a detailed guide to each major component.

1. CDF Common Module
Purpose: Establishes the foundational infrastructure required by all other modules. It ensures consistency across environments by provisioning shared resources.
Key Components:
-
Data Spaces: Provisions dedicated spaces for data instances (
demo.Space) and function execution (functions.Space). -
Datasets: Creates the
demo.DataSetused to organize transformations, functions, and workflows. -
RAW Databases: Sets up
source.DataBasefor raw ingestion andcontextualization_state.DataBasefor tracking process states. -
Direct Relation Writer Function: A specialized CDF Function that converts approved annotations into permanent direct relations in the data model.
-
Extraction Pipeline: Manages the configuration for the relation writer job, including mappings between file annotations and target entities.
2. QS Enterprise Data Model Module
Purpose: Provides a comprehensive enterprise data model that extends the standard Cognite Data Model (CDM) with organization-specific views and containers, offering complete coverage for process industry use cases including 3D visualization, work management, and asset hierarchy.
Key Components:
-
Containers: Define the physical storage for 38+ entity types including Assets, Equipment, TimeSeries, WorkOrders, CAD/PointCloud/360Image models, Maintenance Orders, Notifications, and more.
-
Views: Provide a queryable interface for all entity types, extending CDM base types (e.g., CogniteAsset, CogniteEquipment) with custom enterprise views (e.g., Asset, Equipment, Enterprise_TimeSeries).
- Spaces: Three dedicated spaces for data organization:
- Schema Space (sp_enterprise_process_industry): Stores all data model definitions
- Enterprise Instance Space (sp_enterprise_instance): Stores enterprise-wide instance data
- Site Instance Space (sp_site_instance): Stores site-specific instance data
- Quickstart Enterprise Data Model: A unified schema (
ORGProcessIndustries) combining all CDM interfaces and custom enterprise views into a single queryable model.
- Enterprise Search Data Model: An optimized search model (
ORGProcessIndustriesSearch) with a subset of views tailored for search functionality across Assets, Equipment, TimeSeries, Files, and WorkOrders.
Note: QS Enterprise Data Model is a separate deployment pack. To know more about this deployment pack refer to the hub article How to get Started with Quick Start Enterprise Data Model.
3. CDF Ingestion Module
Purpose: Orchestrates the entire data population and contextualization lifecycle through automated workflows.
Key Components:
-
Ingestion Workflow: A multi-phase orchestrator that runs transformations in the correct dependency order (for example, ensuring Assets exist before linking TimeSeries to them).
-
Auth Groups: Define specific security permissions for users and service accounts to execute automated runs safely.
-
Phase 1 (Population): Tasks that move data from RAW tables into the Data Model for entities like assets, equipment, and files.
-
Phase 2 (Contextualization): Tasks that create relationships between the populated entities.
4. Source System Modules (SAP & PI)
Purpose: Handle the initial ingestion of master data and time-series metadata into CDF.
SAP Assets Components:
-
Population Transformations: SQL logic that transforms SAP functional locations and equipment into the CDF Data Model.
-
Connection Transformation: Links equipment to their parent assets based on the SAP hierarchy.
PI (TimeSeries) Components:
-
Population Transformation: Converts PI tag metadata into TimeSeries instances and extracts asset tags from naming conventions into the
sysTagsFoundproperty. -
Extraction Pipeline: Manages the configuration for PI data extractors.
5. CDF Connection SQL Module
Purpose: Automates relationship creation using declarative SQL logic based on tag matching.
Key Components:
-
TimeSeries to Asset/Equipment: SQL transformations that match
sysTagsFoundon TimeSeries to the names of Assets or Equipment. -
Maintenance Order to Asset: Links work orders to physical assets by scanning order metadata for asset tags.
-
Activity to TimeSeries: Connects work activities to specific data streams via pattern matching.
6. CDF File Annotation Module
Purpose: An advanced framework for automating the identification and linking of entities within documents (such as P&IDs).
Key Components:
-
Prepare Function: Identifies new or reset files that require annotation.
-
Launch Function: Groups files by site and submits jobs to the Cognite Diagram Detect API.
-
Finalize Function: Processes API results, applies confidence thresholds, and creates edges in the data model.
-
Promote Function: Automatically resolves “pattern-mode” annotations (for example, text matches) by finding the correct entity in the data model.
-
Link Assets: Transformation to directly link the files with annotated assets.
-
Annotation Workflow: Orchestrates the sequence of the four functions above.

7. CDF Entity Matching Module
Purpose: Provides AI-powered and rule-based matching to link TimeSeries data to Assets.
Key Components:
-
Entity Matching Function: Combines regex rules, expert manual mappings, and machine learning to associate sensors with equipment.
-
Metadata Update Function: Optimizes and enriches entity metadata (such as NORSOK discipline classification) to improve searchability.
-
Annotation Workflow: Triggers matching and metadata updates as new data arrives.
Note: Entity Matching is a separate deployment pack. To know more about this deployment pack refer to the hub article How To: CDF Entity Matching Module.
8. Open Industrial Data (OID) Sync Module
Purpose: Simulates real-time industrial data by fetching historical data from a public project and time-shifting it to the present.
Key Components:
-
OID Sync Function: Fetches historical data, applies a time offset (for example, +1 week), and inserts it into the target project.
-
Smart Backfill Strategy: A logic gate that ensures all TimeSeries get real-time updates while progressively backfilling 12 weeks of history.
-
Scheduled Trigger: Automatically runs the sync every 10 minutes.
9. CDF Quality Reports Module
Purpose: Provides automated monitoring and governance of the contextualization process.
Key Components:
-
Quality Transformations: A suite of SQL scripts that calculate “link rates” (the percentage of entities successfully connected) across various types.
-
Contextualization Rate Workflow: Runs all quality reports in sequence to generate a unified snapshot of data health.
-
Report Table: A dedicated RAW table that stores historical metrics for trend analysis and gap identification.
Prerequisites
Before you start, ensure you have the following:
- Ensure your local CDF and Cognite Toolkit projects are configured. If they aren't, follow these setup instructions before continuing with cognite-toolkit package version not less than 0.7.33.
- If your project is missing the standard
cdf.tomlfile, generate it by executingcdf initand choosing the Create toml file (required) option.

- Ensure you have valid authentication for your target CDF environment. If you haven't authenticated yet, run
cdf initand select Authentication, or use thecdf auth initandcdf auth verifycommands. For detailed guidance, refer to the Toolkit authentication documentation.

- The data plugin enabled in your
cdf.tomlfile. - Ensure the following configuration exists in your
cdf.tomlto point to the correct library repository:[library.cognite]
url = "https://github.com/cognitedata/library/releases/download/latest/packages.zip"
Getting Started
Follow the steps below to get started with the Quickstart Deployment Pack. Recommended to start with a new project or clean project.
Step 1: Add the Module
Run:
cdf modules init . --clean
⚠️ Disclaimer: This command will overwrite existing modules. Commit changes before running, or use a fresh directory.
This opens the interactive module selection interface.
Step 2: Select the Quickstart Deployment Pack
From the menu, select:
Quickstart Deployment Pack: Quickstart deployment pack for CDF help you showcase CDF capabilities in a quick and easy way.
Step 3: Verify Folder Structure
After installation, your project should contain:
modules
├── accelerators
│ ├── cdf_common
│ ├── cdf_ingestion
│ ├── contextualization
│ │ ├── cdf_file_annotation
│ │ ├── cdf_entity_matching
│ │ └── cdf_connection_sql
│ ├── industrial_tools
│ │ └── cdf_search
│ └── open_industrial_data_sync
├── sourcesystem
│ ├── cdf_pi
│ ├── cdf_sap_assets
│ ├── cdf_sap_events
│ └── cdf_sharepoint
├── dashboards
│ └── rpt_quality
└── models
└── qs_enterprise_dmIf you want to add more modules, continue with yes (Y); otherwise choose no (N).
Proceed with creation (Y) to generate the folder structure and files for your selected modules.
Step 4: Update the Configuration File
Once the files are available, update the configuration files config.<env>.yaml for any variables that are not set. Environment variables must also be updated with Client ID and Client Secret values for different data sources such as Aveva PI, SAP, etc., which can be found under the sourcesystem module’s variable declarations. As of for now the Client ID and Client Secret values for these sources are using IDP_CLIENT_ID and IDP_CLIENT_SECRET.
Update the following variables in the configuration file:
-
<my-project-env>Replace with your CDF project name for that environment. -
Set
GROUP_SOURCE_IDwith your designated Group Object ID &OPEN_ID_CLIENT_SECRETin .env file. You can get Open Id Client Secret from the hub page on Open Industrial Data (OID). You can keep same ADO group for all the auth groups or different groups depending on your use-case. If you decide to use different groups, use different variables names in .env file and update the same in config.<env>.yaml as well.GROUP_SOURCE_ID=<group_source_id>
OPEN_ID_CLIENT_SECRET=<open_id_clinet_secret> -
Update following variable under the cdf_entity_matching module:
-
targetViewFilterValuesfrom root:WMT to root:ast_VAL -
targetViewSearchPropertyfrom name to aliases -
AssetViewExternalIdfrom YourOrgAsset to Asset -
TimeSeriesViewExternalIdfrom YourOrgTimeSeries to Enterprise_TimeSeries -
targetViewExternalIdfrom YourOrgAsset to Asset -
entityViewExternalIdfrom YourOrgTimeSeries to Enterprise_TimeSeries
-
-
Update the
ApplicationOwnervariable in the cdf_file_annotation module by replacing<APPLICATION_OWNER>with the email ID(s) of the person or group responsible for the Streamlit app.
⚠️ Note:
- It is strongly recommended to keep Client IDs and Client Secrets as environment variables rather than in your configuration files. By default, these variables are read from the
.envfile and will cause build or deploy errors if missing.- Review and update the cron expressions within your configuration file. Currently, they are placeholders set for February 29th and should be adjusted to your desired frequency before deployment.
Step 5: Enable File Annotation mode in Asset Transformation
In transformation file asset.Transformation.sql which can be found under modules > sourcesystem > cdf_sap_assets > transformations > population, there are two modes - COMMON MODE and FILE_ANNOTATION MODE. Uncomment the FILE_ANNOTATION MODE and comment out the COMMON MODE. The FILE_ANNOTATION MODE uses ast_<id> format for external IDs, creates a root asset ast_VAL (representing the Valhall platform), and sets the top-level asset to have ast_VAL as its parent while other assets reference their ancestor via WMT_TAG_ID_ANCESTOR. This mode also populates the aliases property with name variations and the tags property with 'DetectInDiagrams' for assets having two or more dashes in their name, enabling diagram detection matching.
Testing the Quickstart Package
The Quickstart Deployment Pack includes test data in the sourcesystem module. Using this data, you can quickly deploy and test the Quickstart modules. The result will be P&ID files being annotated, and contextualization rates can be checked in the deployed Streamlit app.
-
Deploy to CDF:
-
Populate variables in resource YAML files from the configuration file by running
cdf build. If warnings appear, resolve them and rebuild using the same command.⚠️ Note: Ignore warnings such as
WARNING [LOW]: Module 'cdf_pi' has non-resource directories: ['upload_data']... -
Optionally dry-run the deployment using
cdf deploy --dry-runto validate everything before deploying to your CDF instance.
-
Deploy to your CDF instance using
cdf deploy.
-
-
Uploading data through
cdf deployin Toolkit will be deprecated in version 0.8 and later. The new Deployment Pack supports this change, and all synthetic data is stored under theupload_datadirectory in the respective modules. The new way to upload data is using the data plugin’suploadcommand.-
In your cdf.toml file, verify that the data plugin is enabled:
[plugins]
data = true -
Run the following commands to upload the synthetic data:
cdf data upload dir modules/sourcesystem/cdf_pi/upload_data
cdf data upload dir modules/sourcesystem/cdf_sap_assets/upload_data
cdf data upload dir modules/sourcesystem/cdf_sap_events/upload_data
cdf data upload dir modules/sourcesystem/cdf_sharepoint/upload_data
cdf data upload dir modules/accelerators/contextualization/cdf_entity_matching/upload_data
cdf data upload dir modules/accelerators/contextualization/cdf_file_annotation/upload_dataVerify the data upload in Integrate > Staging in CDF.
⚠️ Note:
- These
upload_datadirectories containManifest.yamlfiles with hardcoded table and database names. If you change table or database names in the configuration file, update the correspondingManifest.yamlfiles as well. - If you are maintaining your modules under the organization directory, then add the organization directory name to the start of the
upload_datadirectory path. - Recent versions of the Cognite Toolkit include mandatory project name verification during data uploads. To bypass this check, include the
--skip-verify-cdf-projectflag with your upload command.cdf data upload dir modules/sourcesystem/cdf_pi/upload_data --skip-verify-cdf-project
cdf data upload dir modules/sourcesystem/cdf_sap_assets/upload_data --skip-verify-cdf-project
cdf data upload dir modules/sourcesystem/cdf_sap_events/upload_data --skip-verify-cdf-project
cdf data upload dir modules/sourcesystem/cdf_sharepoint/upload_data --skip-verify-cdf-project
cdf data upload dir modules/accelerators/contextualization/cdf_entity_matching/upload_data --skip-verify-cdf-project
cdf data upload dir modules/accelerators/contextualization/cdf_file_annotation/upload_data --skip-verify-cdf-project
- These
-
- Once data is uploaded, trigger the following workflow files in this order from the Data Workflows UI in CDF:
- ingestion to populate the data model and build connections between assets, equipment, work orders, etc.
- wf_file_annotation to annotate uploaded files. Test files are located in the
sourcesystem/cdf_sharepoint/files/directory and can be viewed in your Industrial Tools Search App. - EntityMatching to run the entity matching workflow. Results can be found in the
dm:context:timeseries:entity_matchingfunction’s run logs.
Post-Deployment Verification & Monitoring
-
Verify File Annotations: Navigate to Industrial Tools → Search App and select the Files tab. Locate your uploaded P&IDs to confirm they have been successfully annotated with linked assets.

-
Monitor Pipeline Health: Use the dashboards under the Custom Apps tab to track the performance and quality of the File Annotation Pipeline.

-
Analyze Contextualization Metrics: Execute the
wf_contextualization_rateworkflow to consolidate results into thetbl_contextualization_rate_reporttable (located in thedb_quality_reportsdatabase). This data is optimized for:-
External Visualization: Building trend dashboards in CDF Charts, Grafana, or Power BI via the RAW API.
-
Ad-hoc Analysis: Direct querying via the RAW API for custom tool integration.
-
Support
For troubleshooting or deployment issues:
-
Refer to the Cognite Documentation.
-
Contact your Cognite support team
-
For any queries related to the deployment pack, write to us on our slack channel #topic-deployment-packs.
Check the
documentation
Ask the
Community
Take a look
at
Academy
Cognite
Status
Page
Contact
Cognite Support