Clarification on Supported OPC UA → MQTT Architecture (Managed Broker in CDF)

Question

Hi Cognite Team,We are currently defining an integration pattern for a partner who will provide data from an OPC UA source into Cognite Data Fusion.Our preference is to use a managed MQTT broker in CDF and avoid deploying any self-hosted MQTT broker or additional bridge components on-prem.Based on the documentation, we understand that the OPC UA Extractor supports an mqtt: output mode, but it also references a requirement for the MQTT-CDF Bridge and a configured broker host/port.To ensure we define the correct and supported architecture, could you please clarify:	Is it supported to configure the OPC UA Extractor to publish directly to the Cognite-hosted MQTT broker in CDF?			If using the Cognite-managed MQTT broker, is the MQTT-CDF Bridge still required?			Is a self-hosted MQTT broker mandatory when using the mqtt: configuration in the OPC UA Extractor?			What is the recommended architecture today for a partner delivering OPC UA data into CDF using MQTT?	Our intended target architecture is:OPC UA → Cognite OPC UA Extractor → Cognite Hosted MQTT Broker → CDFwithout deploying:	A self-hosted MQTT broker			The MQTT-CDF Bridge on-prem	We would appreciate confirmation of the supported and recommended approach for new integrations.

Andre Alves · Accepted Answer

Thanks very much, ​@Einar Omang.Just to give more context for the community:The Cognite OPC UA Extractor already performs batching automatically. The extractor-utils documentation explains that making one request per datapoint would be very costly. Instead, it uses upload queues that accumulate data and trigger batch uploads when a condition is met — either based on a time interval (max_upload_interval) or on the number of items in the queue (max_queue_size).In practice, the extractor:Accumulates datapoints in memory in an internal queueWhen it reaches a configured threshold (e.g., every 1–5 seconds or every N thousand points), it flushes the dataIf there is too much data for a single request, the queue automatically splits it into multiple requests and executes them in parallelThis already provides efficient batching, controlled memory usage, and parallelization — which are key elements for good throughput.It is quite common in high-availability architectures to introduce a message broker such as Kafka or RabbitMQ. Brokers are considered a well-architected approach when we need high throughput and high availability because they:Decouple producers and consumers, allowing each side to scale independentlyProvide buffering and backpressure handling during traffic spikesEnable horizontal scaling through partitions (Kafka) or distributed queuesOffer durability and persistence, reducing the risk of data lossSupport replay and fault recovery mechanismsIn scenarios with multiple producers, multiple downstream consumers, or strict durability and replay requirements, a broker layer can significantly improve resilience and scalability.However, in this case, since the Cognite extractor already implements batching, queuing, and parallel uploads internally, we will rely on the extractor itself for now and avoid adding extra architectural complexity.Let’s see how it performs in practice.I am closing this thread for now.

Einar Omang · Answer

The extractor is built to avoid losing data, so yes, I would recommend using the extractor to write to CDF directly.

Sign up

Welcome to Cognite Hub

Scanning file for viruses.

This file cannot be downloaded