Skip to main content
Question

Do Immutable Streams contain duplicate entries?

  • May 27, 2026
  • 1 reply
  • 20 views

Hi Team,

Our observation while storing duplicate entries in Cognite Streams:


Mutable Streams - It discards the duplicate entries consistently.

Immutable Streams - It sometimes allow exact duplicate entries (externalId and other fields) and sometimes it doesn’t.

Even when we checked in the Cognite AI, it says it usually doesn’t allow duplicate entries but in some rare scenarios, it allows.

Could you please let us know the behavior. If it allows in rare scenarios, please let us know the exact scenarios.

Thanks,
Rahul

1 reply

Everton Colling
Expert ⭐️⭐️⭐️⭐️
Forum|alt.badge.img
  • Expert ⭐️⭐️⭐️⭐️
  • May 29, 2026

Hi ​@rkumar87!

Here's how duplicate handling works across the two stream types:

  • Mutable streams do not allow duplicate record identifiers (space + externalId). An ingest with an existing identifier is rejected, and an upsert updates the existing record. This is why you see consistent behavior.
  • Immutable streams can hold multiple records with the same externalId. A duplicate here is defined more strictly: same space, same container definitions in the sources array, and all property names and values identical. When such an exact duplicate is detected, it is discarded. This deduplication exists so that ingestion retries don't create multiple copies of the same record.

The reason you see deduplication "sometimes but not always" comes down to how the underlying storage is organized:

  • Deduplication is only guaranteed when the duplicate records land in the same data partition. Guaranteeing it across data partitions would be prohibitively expensive, so it isn't done.
  • Data partitions rotate periodically. If you re-ingest the exact same payload within a short window (same partition still active), the duplicate is discarded. If you re-ingest it farther apart, after the data partition has rotated, both records will be kept.

So the practical rule is:

  • Exact same payload, same externalId, different content -> both records always kept
  • Exact same payload re-ingested within the active data partition window -> duplicate discarded
  • Exact same payload re-ingested after an data partition rotation -> duplicate accepted

If your workflows re-process the same data on a recurring schedule (for example daily), they can produce duplicates because the relevant data partition has likely rotated between runs. If you need strict uniqueness guarantees, a mutable stream is the better fit.