Skip to main content
Question

Why PyGen generated SDK lags in performance when compared to query method in cognite sdk ?

  • March 23, 2025
  • 1 reply
  • 62 views

We have generated a SDK using pygen on our Model. 
Post that we requested a data from particular view in below eg - Property_type. It contains ~138k reocrds

Observe the below results on time taken to fetch via

  1. Cognite SDK - 44 secs
  2. Pygen SDK - 82 secs

Its fetching the same data from the same cognite project, but still seeing the performance lag. Its taking almost double time. Both numbers are from my local sandbox.

 

  1. Cognite SDK Code
    config = {
    "client_name": "abcd",
    "project": "slb-odf-qa",
    "base_url": "https://westeurope-1.cognitedata.com/",
    "credentials": {
    "client_credentials": {
    "client_id": "e063088ad3b4548d4911bd4a617990aa",
    "client_secret": "",
    "token_url": "https://p4d.csi.cloud.slb-ds.com/v2/token",
    "scopes": ["9237c91ce1ea434fa5a91262a5ea3646"],
    },
    },
    }
    cognite_client = CogniteClient.load(config)

    view_id_1 = ViewId(space="slb-pdm-dm-governed", external_id="PropertyType",version="1_6")

    def _get_timeseries():
    next_cursor = None # Initialize the cursor as None at first
    all_data = [] # List to hold all results
    while True:
    # Construct the query with the current cursor
    query = Query(
    with_={
    "PropertyType": NodeResultSetExpression(
    limit=10000,
    filter=HasData(views=[view_id_1])
    )
    },
    select={"PropertyType": Select([SourceSelector(view_id_1, properties=['*'])])},
    cursors={"PropertyType": next_cursor}
    )

    result = cognite_client.data_modeling.instances.query(query)

    if "PropertyType" in result.data:
    all_data.extend(result.data["PropertyType"])
    next_cursor = result.cursors.get("PropertyType", None)

    if not next_cursor:
    return all_data

    start_time = time.time()
    data = _get_timeseries()
    end_time = time.time()
    print(len(data))
    print(end_time - start_time)

    ##Output
    # 138205
    # 82.43460583686829

     

  1. Pygen SDK
    config = {
    "client_name": "abcd",
    "project": "slb-odf-qa",
    "base_url": "https://westeurope-1.cognitedata.com/",
    "credentials": {
    "client_credentials": {
    "client_id": "e063088ad3b4548d4911bd4a617990aa",
    "client_secret": "",
    "token_url": "https://p4d.csi.cloud.slb-ds.com/v2/token",
    "scopes": ["9237c91ce1ea434fa5a91262a5ea3646"],
    },
    },
    }
    client = CogniteClient.load(config)

    from my_domain.client import MyClient
    pygen_client = MyClient(client)

    def _get_property():
    return pygen_client.property_type.list(limit=None, retrieve_connections='skip')

    start_time = time.time()
    data = _get_property()
    end_time = time.time()

    print(len(data))
    print(end_time - start_time)

    ##Output
    # 138205
    # 82.43460583686829

 

Looks like an issue in the pygen sdk. Can this be looked into.

1 reply

Anders  Albert
Seasoned Practitioner
Forum|alt.badge.img
  • Seasoned Practitioner
  • March 24, 2025

@Neerajkumar Bhatewara Thanks for testing this. 

Note that when you run the `pygen_client.property_type.list(…, retrieve_connections='skip')`  pygen is using the /list endpoint, while you compare it to the /query endpoint. In the PySDK, this would mean calling `cognite_client.data_modeling.instances.list(sources=[view_id_1])`.

Still, these should have similar performance. I suspect this could have something to do with the pagination on the server side.

Can you try to run pygen_client.property_type.list(limit=None, retrieve_connections='skip', sort_by='external_id')?