Gathering Interest

Support Aggregation of Mixed Numeric and String Time Series in SDK Calls

Summary

Enable aggregation operations (such as step_interpolation) on requests containing both numeric and string time series in the same SDK call.

Currently, aggregation works correctly for numeric time series, but the request fails when a string/alphanumeric time series is included alongside numeric series. This limitation prevents users from retrieving synchronized operational datasets that combine process measurements and categorical/status information in a single query.

Problem Description

When using the Cognite SDK method retrieve_dataframe() with aggregation enabled, requests containing mixed data types fail if at least one time series is of type string.

Example:

Numeric-only aggregation → works correctly
Numeric + string aggregation → fails

The expected behavior for string time series using step_interpolation is to replicate the last known value across the aggregation interval, similar to how industrial historians such as PI System behave.

Example Use Case

Operational dashboards and analytics workflows frequently require:

Numeric process variables (temperature, pressure, flow, etc.)
Text/status variables (campaign, mode, operational state, equipment status)

These datasets must often be retrieved together using the same timestamp alignment and granularity.

Example:

df = client.time_series.data.retrieve_dataframe(
    external_id=[
        "numeric_ts_1",
        "numeric_ts_2",
        "string_ts_status"
    ],
    start="6h-ago",
    end="now",
    aggregates="step_interpolation",
    granularity="1h",
    timezone="UTC-03:00"
)

Currently, including "string_ts_status" causes the aggregation request to fail.

Expected Behavior

For string time series:

step_interpolation should return the last known value within the aggregation window.
Mixed-type aggregation requests should succeed without breaking the entire query.
Returned DataFrames should preserve both numeric and textual columns properly aligned by timestamp.

Current Behavior

Aggregation succeeds for numeric time series only.
Aggregation fails when a string/alphanumeric time series is included.
Users must split requests into multiple calls and manually merge results afterward.

Proposed Enhancement

Implement support for:

Aggregation of string time series using step_interpolation
Mixed numeric and string aggregation in the same SDK/API request
Graceful handling of heterogeneous time series types

Daniel Boechat De Marins
Author
Practitioner ⭐️⭐️⭐️
Forum|Forum|2 months ago
May 29, 2026

We are aware of existing feature requests related to machine-state time series and predefined operational states (e.g. OPEN/CLOSED, ON/OFF/STANDBY). LINK

However, this request addresses a broader and different use case.

Our requirement is not limited to predefined machine states or specialized state-based time series types. Instead, we require support for generic string/alphanumeric time series aggregation together with numeric time series within the same SDK/API request.

Examples include:

Campaign identifiers
Operational modes
Product names
Routing identifiers
Free-text operational tags
External categorical process information

The key requirement is the ability to:

include string and numeric TS in the same retrieve_dataframe() call;
apply step_interpolation consistently;
preserve timestamp alignment across all series;
avoid failures caused by mixed data types.

This capability is especially important for migration and interoperability scenarios with systems such as PI System, where mixed-type aggregation workflows are already supported.

Expected Behavior
The expected behavior is to replicate the latest known string value until a new value is received, similarly to how industrial historians such as PI System handle step interpolation for categorical data.

In the example below, the value DSTNL is preserved across consecutive timestamps until the transition to DSTNP, demonstrating the expected step interpolation behavior for string values.

Mithila Jayalath
Expert ⭐️⭐️⭐️⭐️
Forum|Forum|2 months ago
June 1, 2026

@Daniel Boechat De Marins shall I convert this into a product idea?

Daniel Boechat De Marins
Author
Practitioner ⭐️⭐️⭐️
Forum|Forum|2 months ago
June 1, 2026

@Mithila Jayalath Yes, please.

Mithila Jayalath
Expert ⭐️⭐️⭐️⭐️
Forum|Forum|2 months ago
June 1, 2026

@Daniel Boechat De Marins Thank you for the update. Make sure to upvote the idea.

Everton Colling
Expert ⭐️⭐️⭐️⭐️
Forum|Forum|1 month ago
June 8, 2026

Hi @Daniel Boechat De Marins,

Thanks for the detailed request! This is a well-framed Product Idea and the use case makes total sense. Retrieving numeric process variables and categorical/status series together, time-aligned at the same granularity, is a real need, and it should be easier than splitting requests and merging client-side.

That being said, today string time series do not support aggregates at the API level, only raw data retrieval. So while parts of this could be improved in the SDK directly, the core of what you are describing depends on improvements on the API related to aggregate support for string time series.

We do have plans to evaluate string time series aggregates, but unfortunately we do not have capacity to work on it in 2026. We will revisit it in 2027 and include this cross type considerations, making it easier to query different time series types in the same request.

While it does not solve your immediate needs, we do have state time series (a new type) which is currently in private preview (planned to go GA later this year) and those do support aggregations already and are meant for these category type series that have low cardinality.

We will keep tracking interest on this idea to help inform prioritization, so upvotes and additional use cases here are useful. Thanks again for raising it.