Skip to main content
Question

Utilizing index in data model from a transformation

  • July 1, 2025
  • 4 replies
  • 117 views

Hi

We are testing out a new deletion transformation for a view in data modeling. The transformation looks something like this:

select
externalId
from
cdf_data_models(...)
where
project in ("ProjectA","ProjectB")

This transformation always times out and returns a “Graph query time out” error message. However, when changing the transformation to this the transformation just runs fine:

select
externalId
from
cdf_data_models(...)
where
project = "ProjectA"

union

select
externalId
from
cdf_data_models(...)
where
project = "ProjectB"

For context, the project column here is indexed and the view has around 10 properties and about 1.7 million rows. The transformations all query the same view.

We are therefore wondering how the filtering in transformations handles filtering in data modeling and why these transformations are performing different. Learning more about this would benefit us greatly when working with transformations and data modeling in the future :)

Thanks in advance!

Sebastian

4 replies

Mithila Jayalath
Seasoned Practitioner
Forum|alt.badge.img+8

@sebastianheibo.akso in order to further troubleshoot this issue, can you please let me know the cluster of the project, run and config ids for this transformation


@Mithila Jayalath  What do you mean by the config id? ExternalID?


This is most likely due to filter pushdown.

It looks like on the first case, the filter is not properly pushed down to Data Modeling, while on the second case it is.

Filter pushdown means that the transformation is not getting the data from Data Modeling, it’s requesting already filtered data in Data Modeling.

Filter pushdown does not affect correctness, as transformations always applies a second filtering so if there is extraneous data it is filtered again, just on the transformation side, but it can lead to performance difference, because when the filter is not pushed down, transformation has to fetch a lot of uneeded data from Data Modeling.

My advice here is to use the second flow for now, as it is indeed faster. I’ll try to investigate it and fix filter pushdown in the first flow, but keep in mind while performance is different, correctness is not impacted by this.


Forum|alt.badge.img
  • Committed
  • August 6, 2025

@Jacob Eliat-Eliat I would love to learn more about how we can ensure proper filter pushdown when we are querying Data models with indexes. Is there any documentation/best practices on this?