Skip to main content
Blog

Atlas AI Property Extractor - Intelligent Data Ingestion for Cognite Data Fusion

  • February 12, 2026
  • 0 replies
  • 9 views

 Overview

 

The Atlas AI Property Extractor is a specialized ingestion tool designed to bridge the gap between unstructured documents and structured industrial data within Cognite Data Fusion (CDF). Leveraging Atlas AI agents, it automates the extraction of key technical information—such as equipment tags, metadata, and relationships—from complex documents engineering reports, and technical manuals.

This module simplifies the process of turning static documentation into a dynamic, queryable knowledge graph, enabling faster search and better data contextualization for Subject Matter Experts (SMEs).

Typical scenarios include:

  • Extracting discipline, priority, or category from notification descriptions
  • Populating structured fields from unstructured comments or notes
  • Creating AI-generated summaries alongside original source data
  • Enriching data models with values that would otherwise require manual entry
  • Accumulating tags or keywords over time (append mode)
  • Re-extracting properties after LLM upgrades (overwrite mode)

 

 


Key Features

  • Automated Tag, Aliases,... Extraction: Identify and extract asset tags/aliases and equipment identifiers directly from technical documents.
  • Atlas AI Integration: Designed to work seamlessly with the Atlas AI workspace for agentic reasoning and advanced industrial search.

 


 

 

 Deployment (Cognite Toolkit)

 

Prerequisites

Before you start, ensure you have:

  • A Cognite Toolkit project set up locally
  • Your project contains the standard cdf.toml file
  • Valid authentication to your target CDF environment
  • Access to a CDF project and credentials
  • cognite-toolkit >= 0.6.61

Access: You need appropriate permissions in your CDF project to enable feature flags. Contact your CDF administrator if you don't have access.

 

Step 1: Enable External Libraries

 

Edit your project's cdf.toml and add:

 

[library.cognite]
url = "https://github.com/cognitedata/library/releases/download/latest/packages.zip"

checksum = "sha256:795a1d303af6994cff10656057238e7634ebbe1cac1a5962a5c654038a88b078"

This allows the Toolkit to retrieve official library packages and the Data model deployment.

 📝 Note: Replacing the Default Library

By default, a Cognite Toolkit project contains a [library.toolkit-data] section pointing to https://github.com/cognitedata/toolkit-data/.... This provides core modules like Quickstart, SourceSystem, Common, etc.

These two library sections cannot coexist. To use this Deployment Pack, you must replace the toolkit-data section with library.cognite:

Replace This With This
[library.toolkit-data] [library.cognite]
github.com/cognitedata/toolkit-data/... github.com/cognitedata/library/...

The library.cognite package includes all Deployment Packs developed by the Value Delivery Accelerator team (RMDM, RCA agents, Context Quality Dashboard, etc.).

 

 

 

 

Step 2 (Optional but Recommended): Enable Usage Tracking

 

To help improve the Deployment Pack:

cdf collect opt-in 

 

Step 3: Add the Module

Run:

cdf modules  add

This opens the interactive module selection interface.

 

⚠️ NOTE:  if using cdf modules init . --clean

This command will overwrite existing modules. Commit changes before running, or use a fresh directory.

 

Step 4: Select the AI Property Extractor Function

(NOTE: use Space bar to select module, confirm with Enter)

 

From the menu, select:

Atlas AI Deployment Pack: Deploy all Atlas AI modules in one package (select with “Enter”)

  └── AI Property Extractor Function (Select with “Space bar”, confirm with “Enter”)

 

Step 5: Verify Folder Structure

 

After installation, your project should now contain:

     modules
└── atlas_ai
└── ai_extractor

If you want to add more modules, continue with yes ('y') else no ('N')

And continue with creation, yes ('Y') => this then creates a folder structure in your destination with all the files from your selected modules.
 

Step 6: Deploy to CDF

 

NOTE: Update your config.dev.yaml file  See README.md for specific configuration parameters

 

Build deployment structure:

cdf build 

 

Optional dry run:

cdf deploy --dry-run 

 

Deploy module to your CDF project

cdf deploy 

 

  • Note that the deployment uses a set of CDF capabilities, so you might need to add this to the CDF security group used by Toolkit to deploy
  • This will create/update spaces, containers, views, the composed data model, dataset, RAW resources, transformations, and workflows defined by this module.

 

Usage

 

This module is standalone and does not require other Deployment Pack modules as prerequisites.

However, it does require:

  • An existing Data Model and View: The target view must already exist in CDF with the properties you want to extract and populate
  • Atlas AI / Agents capability: Your CDF project must have Atlas AI agents enabled

Configuration for the module is done in the AI Property Extractor pipeline. (See: README.md

Key elements with the configuration is view and potentially target view. Target view is used when you are using append or overwrite - but can be the same as your source view. The  aiTimestampProperty is used for efficient processing, preventing the same record being processed multiple times.

 

It will automatically run with the configured schedule for the Workflow, but the workflow can also be triggered manually for testing.

 

 

Verify your AI extraction in the Function log:

Support: For troubleshooting or deployment issues:

  • Refer to the Cognite Documentation.
  • Check the README.md within the module folder for specific configuration parameters.
  • Contact your Cognite support team or Customer Business Executive.
  • Configuration example and testing please see: TESTING.md

 


Related Topics