Solved

Performance issue for event retrieval


Userlevel 1

We are having performance issues for event retrieval whenever there are more events even below 500.

Can you help me to understand how partitioning can be done or pagination, as we need to perform operation for each event after retrieval and send it as a api response.
I tried below:

  for unstructre_insight in client.events(type="Insight", limit=None,partitions=10):

     (“do something with unstructre_insight in each partition in parallel to reduce response time”)

I observed that if there are 64 events, then all 64 events are retrieved in one execution, how we can get one partition at a time and perform something for first partition in parallel while retrieving second partition to reduce time.

icon

Best answer by Dilini Fernando 1 June 2023, 15:04

View original

4 replies

Userlevel 4
Badge +2

Hi @Ankita Mane,

I hope Jason’s reply was helpful. As of now, I will close this thread. If you have any questions please feel free to reach out to us.

Best regards,
Dilini

Userlevel 1

I also tried below:

  1.  for unstructre_insight in client.events(type="Insight",chunk_size=32):
        Still it returned all 64 events
  2. n=10                                                                                                                                                          for m in range(n):                                                                                                                                           for unstructre_insight in client.events(type="Insight", limit=None, partition=”{m}+1/{n}”):                        (dont know how to get cursor and put m/n as partition as it is mentioned to put as a string) wrt: (paraller retrieval) https://docs.cognite.com/dev/concepts/pagination/  and https://docs.cognite.com/api/v1/#tag/Events/operation/listEvents
  3. for unstructre_insight in client.events(type="Insight",limit=10):
        Getting same set of 10 insights, don't know how to get next set.
Userlevel 3

Hey Ankita, for me this way works fine

for unstructre_insight in client.events(type="Insight", chunk_size=32):

    print(len(unstructure_insights))

Could you please double-check that you are getting all 64 events at once? 

Userlevel 4
Badge

@Ankita Mane 

I try to highlight several ways you can use the python SDK to retrieve your large number of events.    The SDK auto supports paging.  If you hit the API directly, you will need to navigate the cursors yourself.

Hope this helps,
Jason

from cognite.client import CogniteClient
import time

client: CogniteClient = ...

# Serial retrieval
start = time.time()
events = client.events.list(data_set_ids=[123], limit=None)
end = time.time()
print(f"Time {(end - start):.2f} seconds")

# Parallel retrieval
start = time.time()
events = client.events.list(data_set_ids=[123], limit=None, partitions=10)
end = time.time()
print(f"Partitioned Time {(end - start):.2f} seconds")

# Serial chunk retrieval - keep a limited set in memory
start = time.time()
for event in client.events(chunk_size=1000, data_set_ids=[123]):
pass # Do your work.
end = time.time()
print(f"Partitioned Chunk Time {(end - start):.2f} seconds")

# Parallel chunk retrieval
start = time.time()
for event in client.events(chunk_size=1000, data_set_ids=[123], partitions=10):
pass # Do your work. Is this thread safe?
end = time.time()
print(f"Partitioned Chunk Time {(end - start):.2f} seconds")

 

Reply