Skip to main content

I understand that updates older than 7 days may be discarded from the endpoint, which is perfectly fine. However, I’m wondering if it’s possible to determine when a cursor is older than the given limit, since updates older than 7 days are discarded. Based on my understanding of your Subscription API, it would simply continue iterating over the oldest available updates on the endpoint, which could result in missing important updates that occurred between the time of my cursor and the current oldest update.

 

For the Subscription API to be fully functional, I believe it's essential to either receive a notification or an exception when using an expired cursor, be able to determine the age of a cursor, or somehow be assured that iterating from a cursor won't lead to lost data. Knowing the age of the cursor would be preferable, as it would allow us to gauge how much margin we have, but it should also be possible to receive a notification if an expired cursor is being used.

 

Is this currently possible, or are there any plans to implement such a feature?

It seems like the cursor is base 64 encoded, and that bytes 2-10 is a 8 byte integer MSB counting microseconds since sometime mid 2021, is this something we can depend on to compute the age of the cursor? If so, what's the exact time it's counting from?


It seems like the cursor is base 64 encoded, and that bytes 2-10 is a 8 byte integer MSB counting microseconds since sometime mid 2021, is this something we can depend on to compute the age of the cursor? If so, what's the exact time it's counting from?

Weird flex, but ok.


The cursor is made up of internal variables, and, as you say, base64 encoded, but even if some values seem to grow linearly over time, we have no way of guaranteeing that. In general, when we use base64 encoding, the result is not meant to be parsed or trusted to be in any particular format.

It is, however, a great idea to expose the age of a cursor, in particular if the cursor is older than 7 days, and some data may have been lost. I don’t think it is on the immediate roadmap, but it should certainly be considered, in one way or another.

We do have a parameter to initialize the stream, namely initializeCursors  https://api-docs.cognite.com/20230101/tag/Data-point-subscriptions/operation/listSubscriptionData

You could initialize a cursor to 3d-ago, and see if the data you get is something you’ve seen already, and that way  know whether the cursor is more than 3 days old. It is not particularly user friendly, though.


(...) the result is not meant to be parsed or trusted to be in any particular format. (...)

That makes sense, but in the absence of any good way of telling the age of a cursor we figured we might as well see if the cursor itself held the information - as that would make sense from an architectural standpoint.

Is it not true that the bytes @henrikhestnes describes counts microseconds? What do those represent, if not that? As far as we saw that seemed to be true for a wide range of cursors we examined. We understand that you cannot guarantee that the behavior will not be changed, but seeing that we have no other way of telling that our cursor is too old and that we might be losing data it would be nice to use this for now.

 


Could a Hub admin please convert this question into a product idea please @Anita Hæhre ? 


Converted into a product idea - thanks for sharing @henrikhestnes!


Great, thanks!

However, we would really appreciate a response to the questions posed by @awenhaug, as we’re still stuck on the initial problem.

“Is it not true that the bytes @henrikhestnes describes counts microseconds? What do those represent, if not that? As far as we saw that seemed to be true for a wide range of cursors we examined. We understand that you cannot guarantee that the behavior will not be changed, but seeing that we have no other way of telling that our cursor is too old and that we might be losing data it would be nice to use this for now.”
 


It is not true, it does not represent microseconds. Right now it represent an internal counter in a database, that probably happens to increase approximately once every microsecond, but we don’t know if it will keep doing that, especially under high load. Also, we reserve the right to issue cursors on a new format from time to time, without warning

It is impressive that you’ve managed to reverse-engineer the cursor the way you did, and I can’t prevent you from using it as a clock, even though I can not recommend it.


Thanks for the update @matiasholte. I assume we can then initialize a cursor at “7d-ago”, and compare the counter for that cursor with our cursor(s) to see if the latter are too old? Since it’s a counter I assume it’s monotonic. Or are there several sources of the counter, making counters not comparable?


when you initialize a cursor at 7d-ago, what you do is
1. initialize cursor
2. fetch up to limit updates
3. return a cursor that will fetch data following the updates. (Could refer to anywhere between the last and the next update)

With that caveat covered, your suggested solution may work for the time being.

Also, as always, don’t trust the cursor format, we change it whenever we see a need for it.


Thanks.

Also, as always, don’t trust the cursor format, we change it whenever we see a need for it.

We’re very well aware of the problems with poking at internal state/implementation details, but in the absence of a proper way to do this we would like to see if this can work as a heuristic.

Also, detecting that you’ve done a change should be fairly trivial: We assume your new (if this happens) format will not be anywhere close to decoding to a sane value on our side.

Thanks for your swift response, we hope to see a proper way to check this soon!