Skip to main content
Solved

Clarification on Supported OPC UA → MQTT Architecture (Managed Broker in CDF)

  • March 10, 2026
  • 4 replies
  • 36 views

Andre Alves
MVP
Forum|alt.badge.img+14

Hi Cognite Team,

We are currently defining an integration pattern for a partner who will provide data from an OPC UA source into Cognite Data Fusion.

Our preference is to use a managed MQTT broker in CDF and avoid deploying any self-hosted MQTT broker or additional bridge components on-prem.

Based on the documentation, we understand that the OPC UA Extractor supports an mqtt: output mode, but it also references a requirement for the MQTT-CDF Bridge and a configured broker host/port.

To ensure we define the correct and supported architecture, could you please clarify:

  1. Is it supported to configure the OPC UA Extractor to publish directly to the Cognite-hosted MQTT broker in CDF?

  2. If using the Cognite-managed MQTT broker, is the MQTT-CDF Bridge still required?

  3. Is a self-hosted MQTT broker mandatory when using the mqtt: configuration in the OPC UA Extractor?

  4. What is the recommended architecture today for a partner delivering OPC UA data into CDF using MQTT?

Our intended target architecture is:

OPC UA → Cognite OPC UA Extractor → Cognite Hosted MQTT Broker → CDF

without deploying:

  • A self-hosted MQTT broker

  • The MQTT-CDF Bridge on-prem

We would appreciate confirmation of the supported and recommended approach for new integrations.

 

Best answer by Andre Alves

Thanks very much, ​@Einar Omang .

Just to give more context for the community:

The Cognite OPC UA Extractor already performs batching automatically. The extractor-utils documentation explains that making one request per datapoint would be very costly. Instead, it uses upload queues that accumulate data and trigger batch uploads when a condition is met — either based on a time interval (max_upload_interval) or on the number of items in the queue (max_queue_size).

In practice, the extractor:

  • Accumulates datapoints in memory in an internal queue

  • When it reaches a configured threshold (e.g., every 1–5 seconds or every N thousand points), it flushes the data

  • If there is too much data for a single request, the queue automatically splits it into multiple requests and executes them in parallel

This already provides efficient batching, controlled memory usage, and parallelization — which are key elements for good throughput.

It is quite common in high-availability architectures to introduce a message broker such as Kafka or RabbitMQ. Brokers are considered a well-architected approach when we need high throughput and high availability because they:

  • Decouple producers and consumers, allowing each side to scale independently

  • Provide buffering and backpressure handling during traffic spikes

  • Enable horizontal scaling through partitions (Kafka) or distributed queues

  • Offer durability and persistence, reducing the risk of data loss

  • Support replay and fault recovery mechanisms

In scenarios with multiple producers, multiple downstream consumers, or strict durability and replay requirements, a broker layer can significantly improve resilience and scalability.

However, in this case, since the Cognite extractor already implements batching, queuing, and parallel uploads internally, we will rely on the extractor itself for now and avoid adding extra architectural complexity.

Let’s see how it performs in practice.

I am closing this thread for now.

4 replies

  • Practitioner
  • March 11, 2026

The MQTT pushing feature in the OPC-UA extractor does not do what you think it does. It is deprecated and will be removed shortly, so it can unfortunately not be used in this way. It was built for a different use case a long time ago, but it hasn’t really been kept up to date.

 

If this is something you need, then it should be considered a feature request, and not something we currently support.


Andre Alves
MVP
Forum|alt.badge.img+14
  • Author
  • MVP
  • March 11, 2026

Hi ​@Einar Omang,

 

Thanks for your reply. In the OPC UA Extractor documentation, I found the following about MQTT:

mqtt
Global parameter. Push data to CDF one-way over MQTT. This requires that the MQTT–CDF Bridge application is running somewhere with access to CDF.

Based on your message, should we avoid using MQTT because it’s deprecated (or no longer recommended)?

If so, what is the CDF team’s current recommendation for sending data from an OPC UA Server into CDF in a high-availability scenario?

Should we go with OPC UA → Cognite API directly, or is there another supported/recommended architecture?

If you have recommendations based on what is currently supported, we’d really appreciate your guidance.

regards,
Andre


 


  • Practitioner
  • March 12, 2026

The extractor is built to avoid losing data, so yes, I would recommend using the extractor to write to CDF directly.


Andre Alves
MVP
Forum|alt.badge.img+14
  • Author
  • MVP
  • Answer
  • March 12, 2026

Thanks very much, ​@Einar Omang .

Just to give more context for the community:

The Cognite OPC UA Extractor already performs batching automatically. The extractor-utils documentation explains that making one request per datapoint would be very costly. Instead, it uses upload queues that accumulate data and trigger batch uploads when a condition is met — either based on a time interval (max_upload_interval) or on the number of items in the queue (max_queue_size).

In practice, the extractor:

  • Accumulates datapoints in memory in an internal queue

  • When it reaches a configured threshold (e.g., every 1–5 seconds or every N thousand points), it flushes the data

  • If there is too much data for a single request, the queue automatically splits it into multiple requests and executes them in parallel

This already provides efficient batching, controlled memory usage, and parallelization — which are key elements for good throughput.

It is quite common in high-availability architectures to introduce a message broker such as Kafka or RabbitMQ. Brokers are considered a well-architected approach when we need high throughput and high availability because they:

  • Decouple producers and consumers, allowing each side to scale independently

  • Provide buffering and backpressure handling during traffic spikes

  • Enable horizontal scaling through partitions (Kafka) or distributed queues

  • Offer durability and persistence, reducing the risk of data loss

  • Support replay and fault recovery mechanisms

In scenarios with multiple producers, multiple downstream consumers, or strict durability and replay requirements, a broker layer can significantly improve resilience and scalability.

However, in this case, since the Cognite extractor already implements batching, queuing, and parallel uploads internally, we will rely on the extractor itself for now and avoid adding extra architectural complexity.

Let’s see how it performs in practice.

I am closing this thread for now.