Skip to main content

Proposal to Reduce Query Timeouts in High-Volume Views

  • May 20, 2025
  • 2 replies
  • 157 views

Forum|alt.badge.img+3

Context

Querying views with a large number of instances (>1 million), we frequently encounter query timeout issues. This has become a critical bottleneck affecting application performance and user experience.

 

To mitigate this, we introduced on the app layer a pre-query caching strategy:

  1. Before sending a query to Cognite, we aggregate the number of instance spaces for a given view using the endpoint /models/instances/aggregate.
  2. This result is stored in a cache layer.
  3. When a query is initiated, we check if the user included a space filter.
    • If not, we append the known relevant spaces from the cache to the query filter.

This approach has significantly reduced timeouts across our applications. However, it introduces new challenges:

  • One request per view is still needed to fetch associated spaces.
  • Cache invalidation must be managed periodically, especially as user capabilities may change.
  • This workaround does not help with timeouts in the CDF UI or Infield tools, where we cannot control the query behavior in the same way.

 

Proposed Solution 1: Data Model Metadata Configuration

We propose enhancing the data model configuration by allowing metadata to define scoped instance spaces. This way, views can automatically infer relevant spaces during queries.

 

Example Data Model

 

externalId = "RadixSOL"

space = "RDX-COR-ALL-DMD"

version = "1_0_0"

 

type User {

  name: String

  birthday: String

  relatedTo: [Asset]

}

 

type Asset

  @import(

    dataModel: {

      externalId: "CogniteCore"

      version: "v1"

      space: "cdf_core"

    }

  ) {

  name: String

  description: String

}


 

Desired Configuration

We want to specify that views within RDX-COR-ALL-DMD should be queried using:

  • A fixed set of instance spaces, e.g., [ "RDX-USA-ALL-DAT", "RDX-BRA-ALL-DAT" ]
  • Or, using a prefix pattern, e.g., RDX-*, to dynamically resolve all matching spaces.

This would provide a declarative, maintainable, and scalable approach to improve query performance across tools and APIs.


 

Proposed Solution 2: Instance Space Inspection Endpoint

If extending metadata is too complex, we propose an alternative:

Introduce an API endpoint like:

 

GET /models/datamodels/inspectSpaces

 

This endpoint would return a mapping of each view in the data model to the instance spaces it spans, eliminating the need for repetitive aggregation requests. Ideally, this endpoint would not consider the spaces that the user has access to, but all the spaces related to each view in a data model.

This would:

  • Reduce the overhead on the aggregate endpoint.
  • Simplify client logic.
  • Provide a reusable utility for performance optimization in both UI and backend services.

​ ​@Elka Sierra ​@Sunil Krishnamoorthy 

2 replies

AndersM
Seasoned Practitioner
  • Seasoned Practitioner
  • May 23, 2025

Hi Lucas,

Thanks for sharing this with us!

I think the idea of constraining graph traversal complexity by harnessing known correlations between types and spaces is a good one, and it’s one we haven’t made use of currently.

I’d like to take your idea a little further. Could we push the “data space constraint” all they way down to containers? As containers represent the actual storage of data and containers can have an arbitrary number of views mapping data from them, a constraint would be most effective when paired with the container directly, as they would then apply automatically to all views providing access to that container. We would then also be able to enforce the constraint during insertion, ie. only nodes and edges in the container’s “allowed data spaces” would accept upserts of data to that container, regardless of which view is used to upsert data for that container.

Data models are just a grouping of views or specifically view versions, and a single view (view version) can be referenced from many different data models, so they’re even more removed from the actual data storage than the views. I’m not ruling out having these space filtering concepts on either views or data models, but as you don’t need to reference a data model when retrieving data, just a view version, the DM APIs can’t necessarily tie a space filter to a view during query time. We could simulate this through the GraphQL API due to the illusion our current GraphQL API provides of views existing only in the context of a data model, but this illusion does not hold when accessing the DM graph APIs directly. So it’s a little tricky. =)

Thoughts?

Thanks,

-AndersM


Forum|alt.badge.img+3

Hi Anders,

I agree. Configuring by container makes a lot of sense and provides clear value, especially since the configuration would be inherited across multiple views. This approach would definitely enhance scalability and simplify maintenance. The ability to enforce specific spaces as constraints is also a great way to ensure consistency and manageability.

When I initially mentioned configuring by data model, I was thinking more from a usability perspective. Specifically, making it easier to configure via a UI. However, under the hood, I was actually considering defining the rules based on the space of the view, particularly when the project structure consistently follows view/container patterns within the same space.

For example, in the projects I’ve worked on, containers or views in the space RDX-COR-ALL-DML typically have their instance spaces within a fixed range, such as:

  • RDX-COR-ALL-DAT

  • RDX-COR-ALL-REF

  • RDX-{SITE}-COR-ALL-DAT

Given that structure, I agree that configuring by container is the better approach. That said, I also think we could potentially reduce configuration overhead by grouping settings based on container spaces.