Cognite Data Fusion October Roadmap Update: Data Onboarding

Related products: Cognite Data Fusion Product Roadmap

As fall moves closer to winter, it is again about time to provide a short update on the roadmap of Cognite Data Fusion, and share some of the exciting work that is going on in different parts of the product. To avoid overloading you with information, we will split this update into several sections, allowing us to provide more content but in bite-sized chunks that are easier to digest - and to produce. We’ll be submitting a set of updates in the weeks to come, so please stay tuned for more news on other parts of the product. As always, we’d love to hear your feedback, questions and ideas right here on Cognite Hub.

Data Onboarding deals with the job of extracting data from source systems, loading it into Cognite Data Fusion, and processing the data to build the industrial knowledge graph. This also includes contextualization, the technologies that enable Cognite Data Fusion to find connections and relationships between data even if the connecting data elements are not obvious, and thus build a knowledge graph which is richer than the data it is created from.

 

A rich connector marketplace, and ease of setting up connectors

The catalogue of systems that Cognite Data Fusion supports with packaged extractors is ever growing - our goal is to hit 100+ by the end of the year. The list of systems that are coming up is too long to describe here, but you can look forward to both support for new systems such as Aspentech’s Enterprise Historian IP.21, support for a wider range of SAP versions and protocols, as well as significant improvements to existing capabilities for systems like relational databases and Documentum. One fun and suprisingly useful piece of connectivity coming up shortly is the ability to connect directly to Excel and Google Sheets spreadsheets and use these as tabular sources of data. In addition to building a richer catalogue of connectors, we are also working on making it easier to host and run extraction jobs directly in Cognite Data Fusion, whenever the upstream system is reachable from the cloud. Hosted extractors provide a very simple “click and configure” experience, making it very easy to set up new extraction pipelines directly from the UI of Cognite Data Fusion, as well as providing scalable performance and ease of monitoring and operating the extractors over time.

 

Data Workflow Orchestration - streamline and turbocharge your data pipelines

Towards the end of the year, we will be shipping the first release of a new way of transforming and working with data being onboarded to Cognite Data Fusion. We are moving beyond the current capabilities of the transformation pipeline, and introducing the concept of a data workflow, which contains a set of data onboarding tasks (These can be Spark Transformations, Cognite Functions, or many other tasks) that are performed on the data to be processed, either in parallel or in sequence. These tasks can be triggered by schedules, external events or by previous tasks, enabling data workflows to move from schedule-based workflows to being triggered by updates and proceed to execute a set of tasks efficiently and without delay. This will dramatically increase the capabilities, predictability and performance of the data pipelines in Cognite Data Fusion. Not satisfied with improvements under the hood, we also aim to ship a rich user experience for working with data workflows as part of the product. Look forward to the first iterations of this towards the end of the year, and continuing on into 2024.

Conceptual designs for data workflow editor

 

Contextualizing engineering diagrams

We are targeting major improvements to the capabilities of Cognite Data Fusion of extracting context and meaning from engineering diagrams, more specifically Piping and Instrumentation Diagrams (commonly referred to as P&IDs). These diagrams are core for understanding system context of an industrial asset. We already support extracting tags from P&IDs, allowing them to be linked to assets (and thus to timeseries data, 3D models and more), but we are now looking to take the next step, and extract more knowledge from the symbols in the diagram. Our goal is to build a tool that allows you to identify symbols, connections and flows of fluids (or gas) in a system. This tool will be possible to train on the particular standards and symbolic conventions used in your company, and will allow for human-assisted contextualization of P&IDs where you can train the system to gradually do more and more of the work. This will unlock new levels of richness in the industrial knowledge graph, and enable new use cases with the unique data that will only be available in Cognite Data Fusion. Look forward to the first updates and a richer contextualization tool for P&IDs early in 2024, with subsequent improvements coming later in the year.

 

izMEJM19nGw3i_mfVeqDt-sT7m9qKxoYKdc7MDnMdWTxgAcv9FbGBjSbxwmb9VvSdDzGq1XmDNADujOGMokYSCzgYprIDqIX3Gs4D5alikA6PDeG1DbUXiYSJn9DcGjpaIL7xTzipKrSRcs0fY3OBrBN9g=s2048

 

Example engineering diagram showing rich data contextualization

 

Onboarding data from documents using Generative AI

Lastly, no roadmap update is complete without mentioning at least one intiative fueled by the rapid evolution of generative AI. This time around, we’d like to highlight some work we are doing on leveraging generative AI to extract data from documents, and onboard that data into Cognite Data Fusion. This is an application that has many use cases, a simple one being the ability to create a digital representation of PDFs containing equipment data sheets, where the values specified (for things like maximum and minimum operating temperatures, pressures, nominal power consumption, etc) can then be used as actual data (complete with lineage, units of measure, etc.) inside Cognite Data Fusion in things like monitoring jobs, analysis, troubleshooting and data science. In other words, onboarding actual data from a PDF file will feel just as simple as onboarding data from a table in a database. Look forward to seeing the first product releases with support for leveraging generative AI to onboard data from documents in early 2024.

 

If you want to follow our development and shaping of these and many other features of Cognite Data Fusion, use Cognite Hub to engage with us and stay up to date with the latest developments.

 

Cognite Roadmaps are forward-looking and subject to change.

Very excited about the progress of the Cognite Roadmap, particularly related to Generative AI features.
Onboarding industrial documents and all the metadata they contain has already shown great traction and application, verified with several customers that the extraction of data from documents not only accelerates time to value GenAI-based use cases, but that it also greatly improves the efficiency of both populating and expanding industrial knowledge graphs
More to come on this in the weeks and months ahead - stay tuned!