Solved

Best practice for joining data in models

  • 26 April 2024
  • 3 replies
  • 57 views

Badge

I am creating a transformation where I am joining data from two different views/containers, where table B has a node reference to table A. I have tried to find documentation for this, but I have not found any so far. 

 

Through friends and trial and error I have found two options, and neither seems to be performing well. 

Example:

Type A {

    Name: String

}

Type B{

   Name: String

   A_ref: A

}

What I have found as possible solutions are to go through cdf_data_models and picking the externalId from the node-reference:
from cdf_data models(<spc>, <mod>,<ver>, “A”) as A join cdf_… as B on A.external_id = B.A_ref.externalId

Same as above, but join on a as a nodereference: 

on B.A_ref = node_reference(<spc>, A) 

 

 

Neither of these options seems to be documented anywhere, and I can’t find any other ways documented. The performance when reading when using these joins seems slow, even though I have set up a few indexes which should cover the different joins. This is also slower then when reading from RAW for a comparable data set. 

icon

Best answer by Dilini Fernando 11 June 2024, 13:47

View original

3 replies

Userlevel 4
Badge +2

Hi @hakon_be,

I hope the above information was helpful. I’m closing this topic for now. Please don't hesitate to create a new post if you have any further questions. 

Userlevel 4
Badge +2

Hi @hakon_be,

Did the information above help you?

Userlevel 3
Badge

Hi @hakon_be 

Sorry for the late reply!

Can you please have a look in the docs “Incremental data loading when using Data Modeling” to see if you can have transformation read only recently updated nodes? 

Also, you can test a similar query to the API directly yourself to see how this performs. Using the Python SDK it would be something like this:

 

query = Query(     with_={         "B": NodeResultSetExpression(filter=filters.HasData(views=[viewB])),         "A": NodeResultSetExpression(from_="B", through=PropertyId(source=viewB, property="ref_A"), direction="outwards", )     },     select={"A": Select([SourceSelector(viewA, ["field1"])])}     # select={"B": Select([SourceSelector(viewB, ["ref_A"])]), "A": Select([SourceSelector(viewA, ["field1"])])}, )


client.data_modeling.instances.query(query = query)

 

 

Thanks

Reply