Skip to main content

How to Choose the Right Extractor for Your CDF Integration: An Overview [Cognite Official]

  • June 5, 2025
  • 1 reply
  • 105 views

Gaetan  Helness
MVP
Forum|alt.badge.img+1

Cognite Data Fusion (CDF) offers a comprehensive suite of extractors designed to pull data from various industrial and IT systems, making it a central hub for operational data. These extractors fall into two main categories based on their deployment and management: Prebuilt Extractors deployed on the customer infrastructure and CDF-Hosted Scalable Extractors.

You can find more information about all our extractors https://docs.cognite.com/cdf/integration/concepts/extraction/

Here's a detailed overview of these main extractor types and the specific connectors you mentioned:

These extractors are installed and run within the customer environment, typically on a server/virtual machine with connectivity to the data sources. They are designed to connect to specific systems using their native protocols or APIs. Cognite provides prebuilt extractors for a wide range of common industrial data sources:

  • OSI PI: The Cognite PI Extractor connects to the OSIsoft PI Data Archive to stream time series data into CDF in near real-time. It also handles historical data backfilling. The Cognite PI AF Extractor retrieves metadata from the OSIsoft PI Asset Framework, building a hierarchical representation of assets in CDF.
  • OPC UA: The Cognite OPC UA Extractor connects to OPC UA servers to extract time series data, events, and asset information. It maps the OPC UA node hierarchy to CDF assets and streams data and events as time series and events in CDF.
  • Database: The Cognite DB Extractor is a general-purpose database extractor that can connect to any databases supporting ODBC in order to query and load data into CDF. It supports both one-time and scheduled extractions, as well as incremental loading.
  • Documentum: The Cognite Documentum Extractor connects to OpenText Documentum and Documentum D2 to extract documents and their metadata into CDF's Files service and staging area, respectively.
  • File: The Cognite File Extractor can connect to, for example, SharePoint Online document libraries to upload files into the CDF Files service.
  • SAP: The Cognite SAP Extractor connects to SAP systems via OData endpoints to extract data into the CDF staging area. It supports full and incremental loads.

2. CDF-Hosted Scalable Extractors

These extractors are managed and run within the Cognite Data Fusion cloud environment, reducing the need for on-premises infrastructure and management. The configuration of the extractors (directly within the CDF user interface, through the toolkit or through API/SDK) is managed by the customer and delivery teams. These extractors are scalable and designed to handle high volumes of data and fluctuating data rates and are ideal for real-time data ingestion and integration with cloud services.

  • Azure Event Hub: CDF provides a hosted extractor to consume data from Azure Event Hub, a scalable event streaming platform that allows for real-time ingestion of data into CDF.
  • MQTT: The Cognite MQTT Extractor is a hosted service that subscribes to MQTT topics and ingests the received data into CDF, primarily for time series data from IoT devices.
  • Kafka: Similar to Azure Event Hub, CDF offers a hosted extractor to consume data from Kafka, a distributed streaming platform, enabling scalable real-time data ingestion.
  • RESTful APIs: CDF provides a mechanism to build hosted extractors that can pull data from various systems exposing RESTful APIs. This allows for flexible integration with a wide range of modern applications and services.

AD_4nXeNM6OZUGpJyiyzcx9WcQEl--UhOCehcYWLGChy4hukTMpTVsUPtueZbv4QVFEGeBK7nvGU-38qiTvpNEZB10VvUw8lfHFTfJCjRwUKWn1PSwugFQcGJX6OJV6z5iAK1GMjHH3Gkw?key=Q3z65_L40KOG_HLYHqJ9Eg

1 reply

Sofie Haug
Seasoned Practitioner
Forum|alt.badge.img+12
  • Cognite Academy Instructor
  • June 5, 2025

Amazing overview! 🚀