Data Size Limitation

  • 9 February 2024
  • 4 replies

Badge +2

Does Cognite copies all data from data sources to staging area?

If yes, is there any limitation of datapoints/ size of data? 

Does size of data, affects the performance of CDF or any other functionalities?



Best answer by Glen Sykes 20 February 2024, 14:03

View original

4 replies

Userlevel 4
Badge +5

Hi @Ankit Kothawade 

It's crucial to understand that not all data from sources needs to be copied to the staging area. Techniques like data streaming and selective data extraction can minimize unnecessary data movement, optimizing performance. Additionally, incremental ingestion, focusing on the data required for specific use cases, is highly advantageous. Cognite offers robust support for incremental ingestion capabilities.

Determining the most suitable ingestion approach often involves gathering use case requirements from data consumers and working backward. Key questions to consider include:

  • What are the business needs driving the data ingestion?
  • What are the expectations for the data quality, granularity, and timeliness?
  • When does the data need to meet these expectations to support business objectives effectively?

The last time I read about Cognite's performance, it was reported that the largest Cognite Data Fusion® Time Series cluster stored around 15 trillion data points. It consistently handled ingesting 40 million data points per second and reading 200 million data points per second. There's an expectation for even higher scalability. However, it's important to consider your actual needs, as managing huge data volumes can become quite expensive.

Hope it can help in your decision.

Userlevel 4
Badge +5

Hi @Ankit Kothawade 

Adding a piece of information: Time series data typically flows directly into CDF and bypasses the RAW stage. The Cognite team is welcome to correct me if I am mistaken. If you agree, I'd like you to mark this topic as answered.

Userlevel 3

You are correct Andre, when Time Series data points are stored in CDF via our OPC-UA or PI extractors, they flow directly from the source system into the Time Series API.

Same is true for OPC-UA event types, which are typically written directly to the Events API by the OPC-UA extractor.

Userlevel 4
Badge +5

Thanks, @Glen Sykes !

I'm always learning from you guys. I really appreciate your assistance.