Impact 2024: The Industrial Data and AI Conference for and by Users | Nominate Speakers Now for a Ch...
I’ve used some workarounds before, using max, min, round, but also hack with the on_error method.For instanceon_error(sqrt(200-TS{id:...}) this will throw an error if TS>200, due to sqrt of negative number* 0 + (value if expression is true), value if no error(value if expression is false)) value if errorYou can also use ln, which fails on <= 0, not just <0The syntax is not very nice, but it should get the work doneIf the input is string time series, it is actually easier, using the map() function
Hi Oussama!We have considered a feature where the request will actively wait in the backend for new data.You send in the request, and if there is data available, we will return it immediately. Otherwise, we will keep the request waiting, while we continuously check if new data has come in. Once we have new data, we immediately return it, keeping the latency low (< 1 second).After a while (2-30 seconds? Not decided yet, could be configurable), we return an empty response, and you can query again.If we make such a feature, would you be interested in being an early tester?Best Regards,Matias
This could be a good use case for data point subscriptions, in that it can give you the latest data across all 500 time series in a single query. It will likely have much higher performance than the regular API.There are some caveats, however:Subscriptions is in beta. Mainly because we don’t know how it work at scale. However, we believe 500 time series is a small scale in this regard (we have a limit of 10.000 time series in a single subscription) Subscriptions will give you the latest updates (deletes, upserts). Not necessarily the data points with the highest timestamps, if the updates are ingested out of order. Subscription data is deleted after 7 days, after which you need to use the regular API The python SDK support is in alphaAre the 500 time series constant? In that case, you can specify them by externalId. Otherwise you can specify a search filter, and the time series will be added automatically to the subscription soon after it is created/updated to match the filter (assumin
405 is usually the code you get when you use the wrong HTTP method.Almost all our time series endpoints use the POST method, only list all time series use GET.Not too familiar with the code you posted though, so I’m not sure it will help you with your problem
See also the subscriptions group on cognite hub https://hub.cognite.com/groups/time-series-subscriptions-early-adopter-247
Hi Anders!The /data/latest endpoint can query for 100 time series at a time, which may help somewhat. If you still hit the quota and receive 429, the best practice is to add some delays between each request.We do also have a brand new functionality, which may help your use case, namely subscriptions.With subscriptions, you can ask for recent changes to up to 10.000 time series at a time. We also have a filter subscription, where you will get updates to all time series that matches a search filter.The functionality is still in beta, hence the conservative limit of 10k time series, and as far as I know, only the python SDK supports it at the moment.If you have any questions, feel free to ask!
HiI’m not familiar with the C# SDK, which method are you using?As for retrieving the latest value of a time series, I would use the /timeseries/data/latest endpoint:https://api-docs.cognite.com/20230101/tag/Time-series/operation/getLatestThat method takes a single parameter, before, and you’ll get the latest data point that is before before. There is no start parameter, we will search back to the beginning of time if necessary.So in your case, I would expect that what works is to use “now” as the time stamp (or leave it blank). I’m assuming you’re not ingesting data points into the future, if you do, you will have to specify an absolute timestamp in ms since epoch.If you’re limited by a quota, we will return a 429 response code, but you need to use a very heavy script for that to trigger.None of this functionality has changed lately, but as I said, I’m not familiar with the C# SDK, and I don’t know how it translates into API calls. There might be some bugs in the SDK that I’m not aware
Unfortunately, we currently do not have a way of ignoring time series without data points, whether it is before the first point or after the last point.I know some users have made a workaround, by adding a “dummy” data point in 1971/1900, and another data point in 2050/2099, which would then provide interpolated values for the missing dateshttps://developer.cognite.com/dev/concepts/resource_types/synthetic_timeseries.html#interpolationThere are some differences between step and non-step series in how interpolation works, for non-step, you may want to add another dummy data point right before the first data point
This could be caused by daylight saving time? After all, the timestamp matches perfectly - there were two occurrences of October 30th 02:00 last year.If you were to inspect the response, I would expect one row to have a timestamp of 1667084400000 and another to have a timestamp of 1667088000000As for the missing data, that can happen when there are gaps in the raw data. https://docs.cognite.com/dev/concepts/aggregation/#missing-data
As described in the documentation, we skip periods with no data. Since the data points arrive approximately every 10 minutes, there should be an aggregate value every 10 minutes (unless the granularity is higher). There are 1440 10-minute intervals in 10 days.If you drop aggregate/granularity parameter, you will receive the raw data points, undistorted. These will also contain the exact timestamp, not rounded like the aggregates.Furthermore, by asking for raw data points, you don’t risk taking the average of two different values, which could easily give a different value than 1 or -1.If you need to aggregate to get the data points at regular intervals, you could also consider stepInterpolation, which is the value of the last data point before (or at the start of) the aggregation interval. Then you will always receive an actual value, but you may miss rapid changes within the interval.
Agree that the documentation should be better.As for the 422 status code (and the other 4xx codes for that matter), we abort the request before we create the time series in the database. The response will tell you which externalIds are duplicated, so that you can remove them from the request and retry the rest.You could request time series by id, which will tell you which time series exist, either through the error message, or, with ignoreUnknownIds=true through the ids in the response. However, it is faster to just try to insert.To update, you will have to use update time series, with a special syntax. There are no endpoints to upsert a time series.
We have done this through Cognite Functions on quite a few time series. We have discussed a feature where we create a new time serie on demand using the same feature, and then calculate the syntetic time serie data point by data point. Do you have any idea of how long it would take to create a new time series this way? We have quite a lot of datapoints: Usually one new datapoint every 10 seconds and over 5-10 years. But using a granularity of up to 1 hour could also work very well. Creating new time series is fast, < 1 second. Populating it with data through Cognite functions is slower, order of magnitude 10M datapoints/minute. So 1-2 minutes for the 0.1 Hz time series over 5-10 years.Then of course you can create a synthetic time series based on the un-aggregated datapoints, and fetch them 10000 points at a time. If you have <10 inputs, I estimate that it should take < 1 second. More inputs, and it may be slower. And then you can manually upload these datapoints to a new ti
Thanks for helpful and prompt replies to our questions! Representing long term synthetic time series with a representation of min/max of at least 1 hour granularity is an important issue for our users, the power system analysts. They typically want to sum the power flow over several components over several years, and see at least a good approximation of the maximum and minimum flow. The max and min values should represent the max/min at least one hour granularity to make it comparable to the capacities of the limiting component in the grid. In synthetic time series, we have currently focused mostly on instant-based functions. So we have implemented min/max over different inputs, at any given point of time. That doesn’t work well with millions of input data points though.For sum of power flow, we have the average, which can be multiplied by time to get the integral. Max/Min aggregates are a relatively low-hanging fruit, which we have plans to implement.Something that is more challeng
Would it be possible to let normal and syntetic time-series share the same back-end, but that syntetic time-series have a time-to-live field, e.g. default set to 1h. In the example, after 1 hour the syntectic times-series is automatically deleted. Currently all synthetic time series use the normal backend, and calculate on the fly. We have considered persistent synthetic time series, where we store synthetic data points back to a normal time series. We still have plans for it, but it is not top priority. Then we would have the min and max generated for us ( maybe not fast enough, but possible to fix?) and we could use the syntetic time-seres just like the normal ones. We want to add min and max as aggregates in synthetic time series. Then you’d be able to do things like max(TS1)+max(TS2), where max() denotes a max aggregate. This will be fast, and it’s not a big feature.What is more tricky is max(TS1+TS2), in which case you’d need persistent synthetic time series or use unaggrega
Yeah, something like that. But the time-to-live-field must be provided by the user. Synthetic time series is also more complicated. In normal time series, it’s easier to reason around gaps. A gap means missing data, and you can fill it in by drawing a line between the start and end of the gap. For synthetic time series, it’s not easy to see which of the input time series are missing, or how to fill in the gap. Thus, we need the user to be much more explicit in how the missing data should be treated, and what constitutes missing data.
Hi!Good question! We agree, and it’s on our todo-listFor now, though, you need to identify the holes. Either prior to query, or by manual inspection afterwards (if all inputs are missing, the result will always be a straight line).The proposed idea is to let the user set a limit on how far we should interpolate. “Never interpolate gaps > 1 day”. I will add you to the task though, so that we can include you in the discussions when we make the final design. Best,MatiasTime series team
Thank you for a great question!Unfortunately, in synthetic time series, we currently only support average and interpolation averages. With average, you only get the bottom graph, which does not capture the variation in the input data. With interpolation, you would get random fluctuations around the average.In your case, the best solution would likely be to have two graphs, one where you use the max aggregate and one where you use the min aggregate. I have added it to the todo list for synthetic time series!
Already have an account? Login
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.
Sorry, we're still checking this file's contents to make sure it's safe to download. Please try again in a few minutes.
Sorry, our virus scanner detected that this file isn't safe to download.