Solved

How to overcome metadata key length and max keys limitation


Userlevel 2
Badge

On serveral occations we have encountered limitations in metadata key length, most frequently when flattening JSON-formatted strings from our event stream data source systems. When the source system presents nested structure of as many as 4 levels we frequently encounter metadata keys that require more than 128 bytes. 

Up to now we have “solved” the issue by abbreviating the metadatakeys at the price of higher maintenance cost of the code and more importantly, that end-users get the perception that we have transformed the data or even dont understand what it represents. We now consider moving towards a solution where we simply put the entire JSON-formatted string into one single metadata value field, and leave to front-end teams and end users to flatten the structure. 

We have a similar issue with max number of metadata keys for timesseries (16). 

Question 1) Could you please suggest other, better options for handling these metadata limitations? 
Question 2) Will templates come to the rescue?  
Question 3) Any plans for increasing the limits on metadata key length or number of metadata keys on Timeseries? 

icon

Best answer by RobersoSN 31 October 2021, 23:46

View original

11 replies

Badge

Or Question 4: Has there been any talk of allowing nested JSON-structures as metadata instead of a purely key-value solution?

Userlevel 2
Badge

Q5) And any plans for allowing Cognite applications that display medatada fields, like ADI, Fusion Data Explorer, to detect the presence of nested structure and display this in a user friendly way?  

Userlevel 3

We haven’t got immediate plans around expanding our metadata capabilities (field lengths, nesting, etc) for time series, but I have added this information as insights for our product development prioritization process so we can do some discovery across customers for needs/expectations/problem sets to solve for.

 

Will also update this thread as we progress.

Userlevel 3
Badge

@Thomas Sjølshagen we’re currently considering using RAW as a lazy loading backend for these events. Given the full JSON structure support we envision events with a row-reference with the payload from the source represented there.

 

What pitfalls should we be aware of for this solution? Especially as it relates to queries of 10k+ rows, for example. Or, are there any access management tools for segregating databases in RAW? 

Userlevel 3

Hi @RobersoSN, really sorry about the (super) tardy response!

The time series documentation is incorrect wrt to the number of metadata keys (i.e. the limit isn’t supposed to be 16). The actual limit is supposed to be 256 key/value pairs and I just got a PR to update the docs for time series and sequences to: 

Limits: Maximum length of key is 128 bytes, up to 256 key-value pairs, of total size of at most 10000 bytes across all keys and values.

For events we believe we may be able to increase the key length to 256 bytes without adverse effects, if that might help a little? Obviously doesn’t address the value related workarounds you’re referring to!

 

What pitfalls should we be aware of for this solution?

 

We support up to 128 000 bytes for the value side, but anything above 8K (8192 bytes) will not be sortable or filterable. For those cases, you will get all of the data in the value field returned. 

 

Or, are there any access management tools for segregating databases in RAW

 

There are not, and we currently do not have plans for access management in RAW. 

Userlevel 3
Badge

Thanks, Thomas. That is a useful clarification regarding the metadata keys/values. And, 256 bytes for keys might be a bandaid for now at least.

For now we seem to be very eagerly awaiting the deployment of schema services to the SN cluster.

 

Thanks for the heads up regarding filtering, and access management in raw.

Userlevel 2
Badge

@Thomas Sjølshagen thanks for your very helpful reply. It certainly will give us a lot more flexibility if you increase the maximum metadata key length to 256 bytes on Events. Any timeline for that change? 

One more question, what is the standard encoding of metadata? Is it UTF-32 for both keys and datafields? And is it the same accross all relevant resource types (assets, events, timeseries … )? 

The way I understand it is that with UTF-32 a 256 byte limit on Event key gives 256/4 characters, while for UTF-8 and UTF-16 we would of course have more (and it is a bit more complex to calculate the maximum number of characters as different characters are encoded with different lengths). 

 

Userlevel 2
Badge

@Thomas Sjølshagen not sure if my question waere unclear or if the conversation just slowly died? Can we expect to see an increased max metadata key length and how do I know how many bytes a character need? 

Userlevel 3

@Andreas Kimsås Sorry, this slipped my mind!

We’re looking at how/whether we have the capacity to include expanding the metadata key length to 256 in one of the upcoming releases, but won’t know which until the team commits in one of the upcoming planning sessions. 

As for key charset, a quick scan of our sources indicate we use UTF-8.

Userlevel 2
Badge

@Thomas Sjølshagen  or others, any update on the maximum metadata key length? According to the current API documentation it still is at 128 bytes. Must we consider workarounds or is there still hope for a change within a couple of months? 

Userlevel 3

Hi @Andreas Kimsås,

With the work going into Flexible Data Modeling, we've not been able to prioritize extending the metadata key length. 

So it's not in the plans for the next couple of months, sorry.

 

 

Reply