Hi,
Context:
We currently organize our data in CDF using a per-country partitioning strategy, where each country has its own space.
This approach was chosen primarily to restrict data access by country in a fine-grained manner.
On top of that, we expose data models grouped by Business Object, such as Well Architecture, Cost Model, etc.
Each Business Object model aggregates several data objects under a common business theme, which also allows us to control access by business domain in addition to country-based access.
We are now planning to integrate a large amount of historical data, which will likely increase our model size by around 4x.
These historical datasets are rarely queried, but we want to make sure their addition does not degrade performance for operational data — both in query latency and data ingestion throughput.
We are evaluating two potential strategies:
-
Keep everything in the same spaces, adding an indexed attribute (e.g.,
is_legacy = true) to distinguish legacy records. -
Create dedicated legacy spaces per country (e.g.,
fr_legacy,dz_legacy) to isolate historical data.
Could you please advise whether separating the legacy data into dedicated spaces would actually provide a measurable performance improvement (in terms of query scope or ingestion throughput)?
In other words:
-
Does space-level separation help reduce query latency compared to filtering on an indexed
is_legacyattribute? -
Are there any best practices or known trade-offs in CDF regarding space-based vs. attribute-based partitioning, especially for large-scale data models?
Thank you for your insights and any recommendations you can share on this topic.
Check the
documentation
Ask the
Community
Take a look
at
Academy
Cognite
Status
Page
Contact
Cognite Support