Best practice for joining data in models

  • 26 April 2024
  • 3 replies


I am creating a transformation where I am joining data from two different views/containers, where table B has a node reference to table A. I have tried to find documentation for this, but I have not found any so far. 


Through friends and trial and error I have found two options, and neither seems to be performing well. 


Type A {

    Name: String


Type B{

   Name: String

   A_ref: A


What I have found as possible solutions are to go through cdf_data_models and picking the externalId from the node-reference:
from cdf_data models(<spc>, <mod>,<ver>, “A”) as A join cdf_… as B on A.external_id = B.A_ref.externalId

Same as above, but join on a as a nodereference: 

on B.A_ref = node_reference(<spc>, A) 



Neither of these options seems to be documented anywhere, and I can’t find any other ways documented. The performance when reading when using these joins seems slow, even though I have set up a few indexes which should cover the different joins. This is also slower then when reading from RAW for a comparable data set. 


Best answer by Dilini Fernando 11 June 2024, 13:47

View original

3 replies

Userlevel 4
Badge +2

Hi @hakon_be,

I hope the above information was helpful. I’m closing this topic for now. Please don't hesitate to create a new post if you have any further questions. 

Userlevel 4
Badge +2

Hi @hakon_be,

Did the information above help you?

Userlevel 3

Hi @hakon_be 

Sorry for the late reply!

Can you please have a look in the docs “Incremental data loading when using Data Modeling” to see if you can have transformation read only recently updated nodes? 

Also, you can test a similar query to the API directly yourself to see how this performs. Using the Python SDK it would be something like this:


query = Query(     with_={         "B": NodeResultSetExpression(filter=filters.HasData(views=[viewB])),         "A": NodeResultSetExpression(from_="B", through=PropertyId(source=viewB, property="ref_A"), direction="outwards", )     },     select={"A": Select([SourceSelector(viewA, ["field1"])])}     # select={"B": Select([SourceSelector(viewB, ["ref_A"])]), "A": Select([SourceSelector(viewA, ["field1"])])}, )

client.data_modeling.instances.query(query = query)