Summary
This How-To outlines the critical role of performance testing in optimizing Cognite Data Fusion (CDF) projects, emphasizing its necessity for ensuring efficiency and effectiveness. It details key measurable factors like data ingestion rates, query performance, and API response times, which are crucial for assessing CDF's operational health. Cognite provides a set of notebooks with example code to facilitate testing across various operations, including data ingestion, query performance, and data modeling, with a strong recommendation to use test/development projects and monitor CDF quota usage. Ultimately, performance testing is presented as essential for achieving project delivery success, ensuring production readiness, and boosting development efficiency, leading to tangible benefits such as faster dashboards, reliable ingestion, and significant cost optimization.
Monitoring Cognite Data Fusion (CDF) performance involves tracking several key factors that ensure the platform operates efficiently and effectively. Based on the documentation from Cognite, typical measurable factors include:
- Data Ingestion Rates: Monitoring the speed and volume at which data is ingested into CDF from various sources.
- Data Processing Latency: Measuring the time taken for CDF to process incoming data, including transformations and contextualization.
- Query Performance: Assessing the speed and efficiency of data retrieval queries executed against the CDF platform.
- Pipeline Execution Times: Monitoring the duration and success rates of data pipelines and workflows within CDF.
- Error and Warning Rates: Monitoring the frequency and types of errors or warnings encountered during data processing and transformations.
- Data Quality Metrics: Evaluating metrics related to data completeness, accuracy, consistency, and timeliness.
- API Response Times: Measuring the responsiveness of CDF's API endpoints when interacting with external applications and services.
- System Availability: Monitoring uptime and availability metrics to ensure CDF is accessible and operational according to service-level agreements (SLAs).
- Integration Health: Assessing the status and performance of integrations with external systems and applications.
These factors help ensure that Cognite Data Fusion maintains high performance, reliability, and efficiency in managing and analyzing industrial data.
To get started with performance testing we have created a set of notebooks with example code for setting up the operation categories for testing. These categories include setting up and generating test data in your CDF project, running the tests and visualizing the results: These tests are:
| Operation Category | What We Test | Project Impact |
|---|---|---|
| Data Ingestion | Time series batch/streaming ingestion | Data pipeline throughput & reliability |
| Query Performance | Range, aggregate, multi-series queries | Application response times |
| Data Modeling | Schema ops, instance CRUD, relationships | Data model scalability |
| RAW Operations | Bulk insert/query, table management | Data lake performance |
| File Operations | Upload/download, metadata handling | Asset management efficiency |
| Transformations | Pipeline execution, resource usage | Data processing workflows |
The Notebooks can be found and downloaded from the CDF Library GitHub repository: https://github.com/cognitedata/library/tree/main/modules/notebooks/CDF%20Perfromace%20Testing
- All notebooks include automatic cleanup functions
- Test data is created in your CDF project - cleanup is essential
- Use test/development projects for performance testing when possible
- Monitor your CDF quota usage during large-scale tests
Why Performance Testing is Critical for CDF Projects
1. Project Delivery Success
- Meet SLA Requirements: Ensure applications meet response time commitments
- Scale Validation: Verify system can handle projected data volumes
- User Experience: Prevent slow dashboards and frustrated end users
- Cost Optimization: Identify inefficient operations that waste CDF quota
2. Production Readiness
- Avoid Go-Live Issues: Catch performance bottlenecks before production
- Capacity Planning: Size infrastructure correctly for data loads
- System Stability: Prevent timeouts and failures under load
- Optimization Opportunities: Find areas for significant performance gains
3. Development Efficiency
- Early Detection: Identify performance regressions during development
- Design Validation: Verify data model and architecture decisions
- Tuning Guidance: Get specific recommendations for optimization
- Baseline Establishment: Track performance improvements over time
Real-World Project Benefits
Before Performance Testing:
❌ "Dashboard takes 30 seconds to load"
❌ "Data ingestion pipeline fails with large batches"
❌ "Users complain about slow search results"
❌ "Hitting CDF quota limits unexpectedly"
After Performance Testing:
✅ "Dashboard loads in <3 seconds"
✅ "Ingestion handles 10,000 datapoints/second reliably"
✅ "Search results return in <1 second"
✅ "Optimized operations reduce CDF costs by 40%"
Included README.md file in project contains detailed documentation and all steps you need to get started with using this module:
Example output from testing
Data Model Performance Testing
Transformation Performance Testing
RAW Performance Testing
File Performance Testing
Time Series Ingestion Performance Testing
Time Series Query Performance Testing
Next steps
Start by running the tests as is, expand to your project need and start using real data instead of generated data, test transformation and test data model. Making the tests relevant for your project. If relevant, move code to run by a schedule to always keep tract of performance by tracking the results in separate performance time series for the CDF project.
By monitoring these measurable factors, organizations can gain insights into the operational health, data quality, and overall effectiveness of their Cognite Data Fusion implementation.
Check the
documentation
Ask the
Community
Take a look
at
Academy
Cognite
Status
Page
Contact
Cognite Support