WITH archived_wells AS ( SELECT DISTINCT ac.WellId FROM cdf_nodes( 'sp_dm_dap_knowledge_graph', 'Activity', '0.71' ) ac JOIN contextualization.dap_wellview_wellid_space_mapping map ON ac.WellId = map.key WHERE map.Archived = TRUE AND ac.Archived = FALSE ) SELECT s.externalId, s.space, s.WellId, true as Archived FROM cdf_nodes( 'sp_dm_dap_knowledge_graph', 'WellState', '0.71' ) s JOIN archived_wells aw ON s.WellId = aw.WellId WHERE s.Archived = FALSE AND EXISTS ( SELECT 1 FROM archived_wells )
WellState has 20M rows of data. To improve SQL performance, I want to use a CTE to get the IDs of archived wells and then update WellState (archived and WellId are indexed). However, I found that even when using EXISTS to check that the CTE result is empty, cdf_nodes still loads instances. In this case, shouldn’t the result just be empty?
Additionally, when I replaced the CTE with a subquery, cdf_nodes still loads instances when the subquery is empty.
Another question in the same topic: when using the cdf_nodes function in a transformation, are the filters specified in the WHERE clause applied before the full data load from CDF, or are they pushed down and integrated into the Spark optimizer for efficient query execution?
Another thing in this topic, if we try to load WellState with cdf_nodes and then filter them with Archived = false, Cognite data engine will repeatedly drop and reload instances until the session is closed. As long as there is a transformation on WellState with this SQL filter added, this issue will occur. What is causing this?
This is behavior that’s mostly due to the underlying spark so I am not 100% sure I can give you a proper answer, but here are some informations that can probably help you: * Yes, static filters are pushed down (depending a bit of the specifics, but the majority of filters can be pushed down) * But joins are not pushed down
* And neither are dynamic filters For the question about the CTE, unfortunately spark cannot push down dynamic filters with our setup, you’ll probably need another transformation and an intermediate storage to be able to do this.
@Jacob Eliat-Eliat thank you for your response. I’m agree with you that the static filters can be pushed down. 70% of WellState instances are loaded with Archived =FALSE in our sql. But after that, the instances will be dropped and then reloaded, repeating this cycle until the session closes. This is the biggest problem we are facing right now.