Root Cause Analysis (RCA) is a method used to identify the underlying causes of equipment failures. The goal of RCA is to understand why failures occur so that measures can be taken to prevent their recurrence.
The Industrial Canvas is particularly useful for conducting RCA by providing a structured environment to gather data and identify the underlying causes of equipment failures. The Cause Map agent further enhances the user experience by guiding the retrieval of relevant data and suggesting a cause map aligned with ISO 14224. It is especially useful for personnel such as site reliability engineers and has proven to significantly reduce the time spent on RCA.
This how-to article shows how to implement an Atlas AI agent that can be adapted and extended to suit the particular needs of customers.
Requirements:
- Atlas AI must be enabled in your Cognite Data Fusion environment. Contact your Cognite Customer Business Executive or contact@cognite.com for more information.
- Equipment data must be available in Cognite Data Fusion Data Modelling
- Access to create and publish Agents and Agent Tools
- Access to deploy Cognite Function with session credentials
Overview of the agent
When configured, the new agent will be available in the list of agents under the Atlas icon at the top left of the Industrial Canvas. A user can add it to both new and existing Canvases.
The Agent’s purpose is
- To help the user retrieve relevant data for the equipment that has failed
- To render a cause map of potential causes for the way in which it failed
In ISO 14224 terms, the cause map is suggested based on the Equipment’s Equipment Class and its possible Failure Mechanisms, given the Failure Mode that instigated the root cause analysis process in the first place.
As a starting point, the agent’s goals, instructions and tools laid out in this article are designed to assist the user find the relevant data in the Core Data Model (CDM) and Process Industries Data Model (IDM). Where data is unavailable, the agent will ask the user for guidance.
Note:
The agent uses a Cognite Function (source code provided) to retrieve a qualified cause map. The function currently supports equipment classes Heat Exchangers and Pumps. For other equipment classes, the agent can leverage the large language model to suggest a cause map. The result depends on the Language Model and may contain mistakes, hence it must be reviewed by domain experts before the RCA is concluded. The LLM fallback can be disabled by changing the instructions.
Configuring the agent
- Switch to the Atlas AI workspace in Cognite Data Fusion.
- Create a new Agent and enter the following information, adjusting to your needs
Field | Example/Instruction |
---|---|
Agent Name | Cause Map Agent |
Description | An agent that helps the user find data related to Equipment and draws a cause map based on failure mode |
Sample questions |
|
Language model | azure/gpt-4o or similar |
Goal | The goal is to retrieve the right equipment and its data and then populate the canvas with a root cause analysis cause-map fetched from the RCA function on a particular equipment provided by the user. |
Instructions | -Converse with the user to check if they need an RCA cause map for any equipment. -Ask for the equipment name. -Use the find equipment tool. -Retrieve and provide the equipment details. - Ask the user if they also want to see the time series data or if they want an RCA map for the equipment. -If they want to see the time series data then Use the find time series tool to find the time series data linked to the equipment and ask if they want to see an rca cause map -If they say yes, then ask for the equipment class and failure mode. -Tell the user that the currently available equipment classes are Pumps(PU) and Heat exchangers(HE) and the currently available failure modes are AIR,BRD,ELP,ELU,ERO,FTS,HIO,INL,LOO,NOI,OHE,PDE,PLU,SER,STD,UST,VIB The above failure codes stand for: Abnormal Instrument Reading, Breakdown, External leakage - process medium, External leakage - utility medium, Erratic output, Failure to start on demand, High output, Internal leakage, Low output, Noise, Overheating, Parameter deviation, Plugged / Choked, Minor in-service problems, Structural deficiency, Spurious stop, Vibration. -Once the user chooses the equipment class and failure mode. Pass the equipment class code(PU or HE) and failure mode as the input to the RCA function. -If the function returns a cause_map then, Use the "cause_map" content to build out a cause map and add it automatically to the canvas. -When you build out the cause map, make sure to use the entire data returned by the function. -First level of the map has to be the failure mode (for example - AIR) and then from level 2 start using the data received from the function and follow the hierarchy in the data depending on indentations. -No shortcuts to be used even when you see repeating patterns in data. The JSON hierarchy always needs to be followed. -Do not overlap the items on the map. -Always use the add_cause_map_to_canvas function to add the map to the canvas. |
- Add the following Tools to the agent:
Find Time Series | |
Tool Name | Find Time Series |
Tool Instructions | -when querying time series, filter on equipment space and external id. |
Call Function | |
Tool Name | RCA Function |
Tool Instructions | -Call the function when the user explicitly asks for Root cause analysis or cause map on a particular equipment. -Provide the below JSON input to the function by replacing values for equipment_class and failure_mode with the details fetched from the user. Replace the canvas_name and canvas_external_id values with the name and external id of the canvas on which the agent is being used. { "equipment_class": "", "canvas_name": "", "canvas_external_id": "", "failure_mode": "" } -After the function executes, summarize what has been done based on the function response. |
Function name | Cause Map Agent Function |
Max polling time in minutes | 1 |
Schema | { "type": "object", "properties": { "equipment_class": { "type": "string", "description": "The type of the equipment. This field is optional." }, "canvas_name": { "type": "string", "description": "The name of the canvas on which the agent is being used. Use an empty string if not available." }, "canvas_external_id": { "type": "string", "description": "The external ID of the canvas on which the agent is being used. Use an empty string if not available." }, "failure_mode": { "type": "string", "description": "The failure mode for root cause analysis. This field is optional." } } } |
- Upload the Cognite Function
Manual method:
- Download Function folder from https://github.com/cognitedata/library/tree/main/modules/root_cause_analysis/rca_with_atlas/functions/rca_canvas_builder and create a Zip file from it.
- Upload the Zip file to Cognite Data Fusion. Go to Data Management → Build Solutions → Functions.
- Add the Zip file in the interface
- Enter the following settings, then click Upload
Dataset | For example <rca-agent-function-dataset> |
Function name | Cause Map Agent Function |
ExternalId | Match external id from Function tool setup |
Toolkit method
- Download module from https://github.com/cognitedata/library/tree/main/modules/root_cause_analysis/rca_with_atlas and add it to your modules folder
- Use Toolkit to deploy the function to your environment
Using the Agent
- Open or Create a new Canvas and give it an appropriate name
- Click the Atlas Icon. The Sidebar opens. Find the agent in the list and click it.



The agent should now be able to assist the user in retrieving data related to the failed equipment, and draw a cause map.