Workflow Failures Require an Alerting Mechanism
Scenario:
- Business Users find Incorrect Data
- Data team rallies to investigate
- Investigation finds CDF Workflows have been failing subsequently for numerous occurrences, with failing transformations unable to create new assets along with data models being unable to be updated.
In order for us to rely on these mechanisms, we must have a mechanism that gives us awareness ahead of the business reporting a data quality problem that is resulting from a processing failure. Without this, trust degrades with each incident.
It was determined that we hit the Instance Limit due to Data Model Spaces in this case.
It’s not unreasonable for these things to occur, but what is unreasonable is for silent failures to continue and data to skew and business to be the alerting mechanism, here.
Within an enterprise that staffs a dedicated data team, the most important thing is maintaining trust in the data regardless of the tooling, and we cannot provide that guarantee with Cognite are our platform today without more proactive alerting mechanisms.