Gathering Interest

Storage for Intermediate Workflow tasks

Related products:Data WorkflowsData Staging (RAW)FunctionsIntegrationsTransformations

Forum|Forum|22 days ago
June 23, 2026
2 replies
75 views

L

+3

Lucas Rosa Alves
Seasoned ⭐️⭐️

Summary

When building workflows in Cognite Data Fusion to populate views in Data Models, there is often a need for intermediate, curated datasets between raw source data and the final target data model.

Today, this intermediate data can be stored in RAW tables, but that requires customers to manage temporary tables, cleanup logic, naming conventions, and lifecycle handling manually. A native, workflow-managed temporary storage layer would make workflow development cleaner, reduce repetitive transformation logic, and simplify the overall implementation.

Business Context

We are using CDF workflows to populate an Asset Hierarchy data model.

The source data comes from multiple systems:

SAP
- Functional locations
- Equipment
AVEVA PI
- Tag metadata

The target asset hierarchy data model contains the following views:

Site
Area
Line
Equipment
System
Subsystem
Tag

The source metadata arrives without the required treatment, standardization, or contextualization. Before writing to the final data model, the data needs to be cleaned, normalized, enriched, and structured according to the target hierarchy.

Current Challenge

In practice, the workflow needs several intermediate transformation steps before writing to the final views.

For example, the workflow may need to transform SAP functional locations into a cleaned and standardized structure before deriving sites, areas, and lines.

Example flow for Site:

tb_functionalLocation

-> tb_functionalLocation_curated

-> tb_site_curated

-> Site view

Example flow for Area:

tb_functionalLocation

-> tb_functionalLocation_curated

-> tb_area_curated

uses tb_site_curated for contextualization

-> Area view

Example flow for Line:

tb_functionalLocation

-> tb_functionalLocation_curated

-> tb_line_curated

uses tb_area_curated for contextualization

-> Line view

Example flow for Equipment:

tb_equipment

-> tb_equipment_curated

-> tb_equipment_contextualized

uses tb_line_curated / tb_area_curated for hierarchy mapping

-> Equipment view

Example flow for System and Subsystem:

tb_functionalLocation + tb_equipment

-> curated functional location and equipment tables

-> tb_system_curated

-> System view

tb_functionalLocation + tb_equipment

-> curated functional location and equipment tables

-> tb_subsystem_curated

-> Subsystem view

Example flow for Tag:

tb_tag

-> tb_tag_curated

-> tb_tag_contextualized

uses equipment/system/subsystem curated data

-> Tag view

These intermediate curated tables are useful because they allow the workflow to:

Reuse cleaning and standardization logic across multiple transformations.
Avoid duplicating the same transformation logic in every step.
Avoid using the final data model views as inputs to transformation logic.
Keep the workflow logic easier to understand and maintain.
Separate raw source data, intermediate workflow state, and final modeled data.

However, this intermediate data can be transient. It is only needed while the workflow is running. After the workflow finishes successfully, the data can be deleted. There is also value to optionally allow users to view this intermediate datasets to analyze/debug the quality of the contextualization.

Today, we can use RAW tables as intermediate storage, but this creates additional responsibilities for the customer:

Creating and maintaining temporary RAW tables.
Cleaning intermediate tables before or after each workflow run.
Preventing stale intermediate data from being reused accidentally.
Managing naming conventions for temporary workflow data.
Adding cleanup steps to the workflow.
Handling failed workflow runs where temporary data may be left behind.
Writing additional code that is not part of the actual business transformation.

Product Idea

Introduce a native workflow-managed temporary storage capability in CDF.

This could work as an internal temporary storage layer for workflows and transformations, where intermediate datasets can be written and read by different workflow steps, but their lifecycle is managed by the workflow execution itself.

Ideally, this temporary storage would be:

Scoped to a workflow or workflow run
Usable by transformation steps
Automatically cleaned up after successful execution, while still allowing end users to later review intermediate datasets for debugging.
Temporarily retained for debugging and/or review
Separated from RAW and from the final Data Modeling views
Managed by CDF instead of customer-maintained cleanup logic

Expected Benefits

This capability would make workflow-based data modeling pipelines much cleaner and easier to maintain.

The main benefits would be:

Reduced amount of customer-managed code.
Less duplication of transformation logic.
Cleaner separation between raw data, temporary workflow state, and final modeled data.
Reduced risk of stale intermediate data impacting future workflow runs.
Easier debugging and monitoring of workflow execution.
More straightforward workflow design for complex contextualization processes.
Better support for multi-step data preparation before writing to Data Modeling views.

rajkamalsarma
Practitioner ⭐️⭐️⭐️
Forum|Forum|20 days ago
June 25, 2026

This is a great idea. In fact, we are actively working on a similar concept for workflows to manage intermediate storage for workflow runs, maintain the states of processed items, and provide previews of the I/O data handled by each task. We are currently in the early stages, having just completed a successful proof of concept (POC) using records for the intermediate storage system.

Like

M

+1

mheinze57
Committed ⭐️⭐️⭐️
Forum|Forum|8 days ago
July 7, 2026

Palantir already has this capability as do basic tools like PowerBI steps or Powerquery. Great suggestion. Palantir also has the ability to grant access to the various steps along the way in case you want to persist the intermediate steps

Like

Sign up

Welcome to Cognite Hub