About the Data Science Toolbox Functions


Userlevel 5

Why are algorithms and data-driven models only available for the few domain experts who also are fluent in advanced software coding? We certainly don’t believe this should be the case.

We are making them available to the rest of the world — particularly to non-coding domain experts.

For many years  industrial data scientists have been building smart algorithms to solve  complex industrial problems. They are now available in a no-code drag-and-drop intuitive interface. Liberate your data and empower domain experts the tools to drive impact every day.

Cognite Charts includes several data science toolboxes that provide subject matter experts (SMEs) out-of-the-box algorithms to process and manipulate data, conduct root cause analysis (RCA) and develop solutions without having to code.

The toolboxes cover basic operations, statistical methods, data transformation, and advanced models. They work out-of-the-box with Cognite Charts, and we will continuously add new algorithms, features, and functionality.

You can view the detailed documentation for our Industrial Data Science Library (InDSL) by clicking here. Of course, you will find and use all of the functions and algorithms in the InDSL via the calculation builder in Charts (charts.cogniteapp.com).

Below, we’ve included a brief description of the different types of toolboxes available in Charts.

Operators

The Operators toolbox contains all the standard arithmetic and algebraic operations that you can use with time series data (addition, subtraction, multiplication, division) and more advanced calculations such as differentiation, integration, time series mapping, and more.

Filter

Filters are algorithms that remove parts of a time series to capture the underlying signal. For example, low-pass filters remove the high-frequency noise of a time series. You can also use filters in conjunction with event detectors to remove undesired phenomena in a time series. For instance, you can use an anomaly detector to map the time series to a binary array indicating the presence or absence of an anomaly. Then apply a boolean mask on the raw time series to remove all detected anomalies.

Detect

You can map time series to a set of discrete variables that indicate the presence or absence of an event. For instance, a steady and transient operation can be determined when large step-changes occur in the sensor reading (potentially due to valve changes, start-ups, etc.). Another example is anomalies where significant unexpected changes in the sensor reading can occur before returning to normal behavior (e.g., spike in value). The Detect module contains algorithms that perform this task of mapping continuous time series to discrete variables based on the behavior of the time series.

Resample

Industrial data is in most cases non-uniformly sampled, and before using the data as part of a model with other time series, the data has to be pre-processed. Resampling data is a typical pre-processing step. The resample toolbox offers a variety of methods to resample your data. This toolbox provides classical resampling methods (e.g., interpolation) and advanced machine learning algorithms to down- or up-sample your data.

Smooth

Smoothers modify time series to boost the main underlying trend and remove fine-scaled phenomena. You can do this in several different ways. Some examples of smoothers found in this toolbox are: filter to remove higher frequency phenomena from the raw data (e.g. Butterworth or Chebyshev low-pass filters), regression-based smoothers that estimate the coefficients of a parametric function to predict the underlying signal, or moving averages that applying a rolling operation on a user-defined window.

Statistics

The Statistics toolbox offers various algorithms to describe, analyze, and model industrial time series data. This toolbox is ideal to describe your data, conduct root cause analysis and exploratory work. The algorithms range from descriptive statistics to linear and nonlinear regression analysis to ML methods (e.g., classification/clustering).

Data quality

Accurate data is a fundamental part of any industrial model. This toolbox contains a collection of advanced algorithms to evaluate, monitor, and improve the data quality of time series. There are multiple dimensions regarding time series data quality: accuracy, timeliness, completeness, validity, consistency, uniqueness. The algorithms in this toolbox provide methods in all dimensions while focusing heavily on ACCURACY. If the data is not correct, the other dimensions are of little importance. Examples of functions found in this toolbox are data gap detection and filling, outlier detection and removal, and sensor drift.

Regression

The Regression toolbox focuses on using classical methods (linear and nonlinear models) and machine learning regression algorithms to describe the relationship between industrial data and physical parameters. It enables you to conduct semiautomatic mapping parameters to historical data and forecast its behaviors several steps into the future.

Oil and gas

This module contains algorithms particularly relevant to the oil and gas industry. You will find methods to estimate parameters such as the Productivity Index (gas flow rate divided by the difference between the reservoir and bottom hole pressures), pressure drop, single-phase flow rate, hydrostatic head, and many others.

Forecast

The Forecast toolbox offers a variety of machine learning algorithms to forecast the behavior of industrial time series, with a particular focus on forecasting based on the correlation between a time series and physical parameters. Forecasting involves learning from historical data to make a prediction several time steps into the future. For industrial time series analysis, this typically involves pre-processing the data, training a parametric time series model, and then predicting the result by a user-defined number of steps into the future.


10 replies

Userlevel 2

Hi, I get a 404 on the InDSL link. 

Userlevel 5

Hi @Jonas Digernes, thanks for letting us know. Could you try the link again? It should be fixed now.

Userlevel 2

I still get 404 on this link https://docs.cognite.com/dslib/

Userlevel 5

I’m sorry, the link to the documentation had been updated to: https://indsl.docs.cognite.com/

The link in the post above is also now up-to-date. 

Userlevel 3

Hi @Jonas Digernes - what are your thoughts on the INDSL?  Is there something you’d like to see us change/improve?

Badge +4

INDSL has many great functions! Can you advise of any plans to make INDSL also callable through the APIs/SDKs e.g. Python, Cognite functions, Grafana/PowerBI connectors?

Userlevel 3
Badge

Hello, Hector. 

Very interesting question. We are always exploring ways of exposing InDSL capabilities to the most users. 

For users comfortable in Python we currently only expose this as a library for you to use. 

In 2023 we are working on allowing no-code users to setup InDSL calculations, persist the results in Cognite Data Fusion, which you can call as a time series through the API. 

So we are curious how you would imagine the best possible way for this to be valuable to you?

 

All the best

Knut Vidvei  

Badge +4

Hi @Knut Vidvei ,

Do you have an example of using the InDSL library in Python?

I’d like to test that but couldn’t find a reference.

 

Thanks,

Hector

 


Edit:

Found and example if anyone is interested:

https://indsl.docs.cognite.com/auto_examples/data_quality/plot_completeness.html#sphx-glr-auto-examples-data-quality-plot-completeness-py

Badge +4

Is the Charts Unit conversion available as part of InDSL or through an API for time-series? 

 

Userlevel 3
Badge

Hi, Hector. 

The unit conversion you have from your screenshot is only available through the user interface. 

We are however working on expanding and exposing this functionality of unit conversion in CDF, and exposing this through API or SDK this year. 

 

Reply