Performance issue for event retrieval

Userlevel 1

We are having performance issues for event retrieval whenever there are more events even below 500.

Can you help me to understand how partitioning can be done or pagination, as we need to perform operation for each event after retrieval and send it as a api response.
I tried below:

  for unstructre_insight in"Insight", limit=None,partitions=10):

     (“do something with unstructre_insight in each partition in parallel to reduce response time”)

I observed that if there are 64 events, then all 64 events are retrieved in one execution, how we can get one partition at a time and perform something for first partition in parallel while retrieving second partition to reduce time.


Best answer by Dilini Fernando 1 June 2023, 15:04

View original

4 replies

Userlevel 4
Badge +2

Hi @Ankita Mane,

I hope Jason’s reply was helpful. As of now, I will close this thread. If you have any questions please feel free to reach out to us.

Best regards,

Userlevel 4

@Ankita Mane 

I try to highlight several ways you can use the python SDK to retrieve your large number of events.    The SDK auto supports paging.  If you hit the API directly, you will need to navigate the cursors yourself.

Hope this helps,

from cognite.client import CogniteClient
import time

client: CogniteClient = ...

# Serial retrieval
start = time.time()
events =[123], limit=None)
end = time.time()
print(f"Time {(end - start):.2f} seconds")

# Parallel retrieval
start = time.time()
events =[123], limit=None, partitions=10)
end = time.time()
print(f"Partitioned Time {(end - start):.2f} seconds")

# Serial chunk retrieval - keep a limited set in memory
start = time.time()
for event in, data_set_ids=[123]):
pass # Do your work.
end = time.time()
print(f"Partitioned Chunk Time {(end - start):.2f} seconds")

# Parallel chunk retrieval
start = time.time()
for event in, data_set_ids=[123], partitions=10):
pass # Do your work. Is this thread safe?
end = time.time()
print(f"Partitioned Chunk Time {(end - start):.2f} seconds")


Userlevel 3

Hey Ankita, for me this way works fine

for unstructre_insight in"Insight", chunk_size=32):


Could you please double-check that you are getting all 64 events at once? 

Userlevel 1

I also tried below:

  1.  for unstructre_insight in"Insight",chunk_size=32):
        Still it returned all 64 events
  2. n=10                                                                                                                                                          for m in range(n):                                                                                                                                           for unstructre_insight in"Insight", limit=None, partition=”{m}+1/{n}”):                        (dont know how to get cursor and put m/n as partition as it is mentioned to put as a string) wrt: (paraller retrieval)  and
  3. for unstructre_insight in"Insight",limit=10):
        Getting same set of 10 insights, don't know how to get next set.