Question

How to read all the instances efficiently?

Forum|Forum|7 months ago
May 5, 2025
10 replies
99 views

Tausif Sayyad
Seasoned

Hi,

I'm building a Streamlit app that fetches instances from the data model. However, the view from which these instances are retrieved contains over 1.3 lakh entries, causing performance issues due to the large volume. Additionally, since Streamlit, supported by CDF, does not support multithreading, I am unable to parallelize other calls.

Is there a way to retrieve all 1.3+ lakh instances in under 10 seconds?

Thanks,
Tausif Sayyad

Lars Moastuen
Seasoned Practitioner
Forum|Forum|7 months ago
May 5, 2025

Can you share the code for retrieving the instances? How long does it currently take? Is it the retrieval process that takes time, or the processing? Note that even though Streamlit in CDF doesn’t support multi-threading it supports async/await so you can trigger many network requests in parallel.

Lars Moastuen (Software Engineer, Atlas AI)

Tausif Sayyad
Author
Seasoned
Forum|Forum|7 months ago
May 5, 2025

Hi @Lars Moastuen,The following code snippet I tried is taking over 40 seconds to fetch all 1.3 lakh instances.!--endfragment>!--startfragment>

client.data_modeling.instances.list(sources=view_list, limit=None, filter=_filter)

Thanks,Tausif Sayyad

+14

Andre Alves
MVP
Forum|Forum|7 months ago
May 5, 2025

Hi @Tausif Sayyad

Could you share how you plan to use the 1.3 lakh instances in your solution? Are you displaying them all at once, or is there a specific interaction you're aiming to enable with these entries?

Also, have you considered implementing pagination or another method of incrementally loading the data to manage the large volume?

Looking forward to your thoughts.

André Alves

Lars Moastuen
Seasoned Practitioner
Forum|Forum|7 months ago
May 5, 2025

@Tausif Sayyad: Thank you. The problem is that the operation of fetching 130.000 instances must be split into 130 requests of 1000 instances. These requests cannot be executed in parallel because we rely on cursors for pagination. Parallel retrieval which are supported by certain Cognite API endpoints are not supported for instances, but you can try to “slice” the result set yourself (i.e. grouping the requests into multiple requests by adding extra filters).

I’d also consider if there are alternative approaches to what you are trying to do - possibly there are API endpoints or SDK functionality that might help you solve the issue at hand. What processing will you do with the items after retrieval?

Lars Moastuen (Software Engineer, Atlas AI)

+14

Andre Alves
MVP
Forum|Forum|7 months ago
May 7, 2025

That’s exactly what I was about to suggest as well, @Lars Moastuen . I completely agree with splitting the requests. As he mentioned initially, he’s probably using a Streamlit UI to display the data, so I assume he doesn’t need to retrieve all instances at once. But let’s wait for @Tausif Sayyad to share more details.

André Alves

Tausif Sayyad
Author
Seasoned
Forum|Forum|7 months ago
May 7, 2025

Hi Everyone,

Thank you for your response.
I’m unable to use the filter as it doesn’t support case-insensitive filtering for instances. Therefore, I need to retrieve all records first and then apply the filter in the code to obtain the required instances while ignoring case sensitivity.

Thanks,
Tausif Sayyad

+14

Andre Alves
MVP
Forum|Forum|7 months ago
May 7, 2025

I haven't tested it yet, but you can probably apply case-insensitive filtering directly in your API call using a Regex filter with the (?i) flag. This avoids the need to retrieve all 1.3 lakh instances first. Here's how you can do it efficiently using chunk_size:

from cognite.client import CogniteClient
from cognite.client.data_classes import filters
from cognite.client.data_classes.data_modeling import ViewId

client = CogniteClient()

view = ViewId("mySpace", "PersonView", "v1")

# Case-insensitive match on the name property
name_filter = filters.Regex(
    property=("mySpace", "PersonView/v1", "name"),
    pattern="(?i)arnold"  # The (?i) makes the match case-insensitive
)

# Use chunk_size to retrieve in batches
for chunk in client.data_modeling.instances(
    chunk_size=500,
    instance_type="node",
    sources=view,
    filter=name_filter
):
    for node in chunk:
        print(node.external_id, node["name"])

Let me know if it make sense in our use case

André Alves

Tausif Sayyad
Author
Seasoned
Forum|Forum|7 months ago
May 8, 2025

Hi @Andre Alves,

I tried the code snippet you shared, but the `Regex` method on filters is not available in the version( 7.75.0) that I am using. Could you please help to understand the version having this `Regex` method?

Additionally, I am using `In` filter since there are multiple values to be filtered. Is it possible to apply the case-insensitivity with `In` filter?

Thanks,
Tausif Sayyad

Tausif Sayyad
Author
Seasoned
Forum|Forum|6 months ago
June 4, 2025

Hi,

Can someone confirm if `Regex` is available in the latest version or will be available in future versions?

Thanks,
Tausif Sayyad

Arild Eide
Seasoned Practitioner
Forum|Forum|6 months ago
June 4, 2025

Hi @Tausif Sayyad

Regex/wildcard filtering is not currently supported in the instances query endpoint (docs) and therefore it would also not be added to the Python SDK. This is a more search engine type of problem, that we will expand on in the future.

Do note that you can parameterize the queries (SDK doc) to more easily build a series of similar requests.

Arild

Sign up

Welcome to Cognite Hub

Scanning file for viruses.

This file cannot be downloaded