Developer and User Community
Join the discussions about delivering apps and data-driven decisions powered by Cognite Data Fusion. Click the + CREATE TOPIC button in the menu bar to start the conversation.
- 148 Topics
- 354 Replies
Hello All! As this is my first time writing, I will do a quick introduction. My name is Patrick Mishima, and I am one of the Cognite Product Specialists. My main specialization is data integration and governance. I've worked with data since 2011 with different roles, projects, and job titles all related to data management. I also have experience with various ETL and BI tools, mainly from Microsoft and SAP and a few others.I lead the Product Specialists team, and we'll soon have more articles to share our knowledge and experience with you.Today and in my following articles, I will focus on the different options that Microsoft Azure and Cognite offer on data integration between our platforms and tools. But first, let's talk about a few essential and fundamental points when working with cloud providers. Using a cloud provider, most of the time, we think about saving on infrastructure costs. That is indeed true, but it's not the only truth. When you decide to move to a cloud provider, you
This article is intended for customers that are planning to migrate, or are in the process of migrating, to Cognite Data Fusion’s (CDF) new authentication system which will be based on Open ID Connect (OIDC). It will cover in detail all aspects of this migration process, and hopefully help making this migration process as simple as possible.What is OIDC? OpenID Connect is an industry-standard protocol supported by many popular identity providers (IdP), such as Azure Active Directory (Azure AD), Microsoft's cloud-based identity and access management service.For more information about OIDC, please visit the official website. What are the benefits of using OIDC over CDF’s legacy authentication? The greatest benefit of using OIDC over CDF’s legacy authentication, is that the user’s identity will now have a single source of truth – the Identity Provider (idP). This simplifies identity management greatly, and makes it more secure by leveraging the idP’s strength – namely identity management.
Aksels ts is Original data , ts average at 04:00 = 0.125, can only be explained by interpolated values The example above shows that 1h aggregates of type average computed at time-point 03:00 takes into considereation the interpolated values in the time-span 03:00 to 04:00. Is it always like this as long as i choose granualarity 1h ? I could not find the answer to my question here:Aggregation | Cognite documentation
On serveral occations we have encountered limitations in metadata key length, most frequently when flattening JSON-formatted strings from our event stream data source systems. When the source system presents nested structure of as many as 4 levels we frequently encounter metadata keys that require more than 128 bytes. Up to now we have “solved” the issue by abbreviating the metadatakeys at the price of higher maintenance cost of the code and more importantly, that end-users get the perception that we have transformed the data or even dont understand what it represents. We now consider moving towards a solution where we simply put the entire JSON-formatted string into one single metadata value field, and leave to front-end teams and end users to flatten the structure. We have a similar issue with max number of metadata keys for timesseries (16). Question 1) Could you please suggest other, better options for handling these metadata limitations? Question 2) Will templates come to the resc
The Cognite Live Product Tour 2023 is just over a week away! This event is our annual showcase of Cognite Data Fusion functionalities – both those that will be released in our upcoming December 6th release, and looking ahead at what is to come throughout the new year. We’ll be driving a discussion with you – our community – about Industrial DataOps and the value of empowering Subject Matter Experts in the industrial organization. We hope you’re looking forward to the event as much as we are! Below is an outline of some of the exciting topics we’ll be discussing. Do you have questions you’d like us to answer during our Cognite Product Tour 2023? We want to hear from you! Please post here on Cognite Hub. INDUSTRIAL DATAOPS AND THE SUBJECT MATTER EXPERTIndustrial DataOps is about breaking down silos and optimizing the broad availability and usability of industrial data. Learn about why we are focusing on enabling the Subject Matter Expert throughout 2023 and the two main challenges that
This post is a hands-on introduction to the features supported in the Transformations Python SDK.Prerequisites Use Case 1: Triggering Transformations Step 1 - Create RAW Tables Step 2 - Uploading data to RAW using Postgres Gateway Step 3 - Create new SQL Transformations Step 4 - Trigger the transformation from Azure Data Factory Use Case 2: Orchestrating Transformations Step 1 - Create RAW Tables Step 2 - Create new SQL Transformations Step 3 - Orchestrate Transformations in sequence PrerequisitesKnowledge: Basic knowledge of Azure Functions and Azure Data Factory Basic knowledge of Cognite Data Fusion RAW and SQL Transformations Prior experience with Python, Postgres and SQL Required Datasets:Download and Unzip the attached hub.zip file, you should find the below structure Use Case 1 : asset-hierarchy.csv UseCase 2: OID-Asset-hirerachy.csv OID-Timeseries.csv OID-Datapoints.csv Use Case 1: Triggering TransformationsData is extracted from source systems and
Hi everyone,We in C4IR Ocean are starting a series where we are challenging the community to model a certain problem in CDF. The aim with this series is to facilitate discussion and invite community members to share interesting solutions and techniques.The first challenge focuses on Open LineageOpenLineage is an Open standard for metadata and lineage collection designed to instrument jobs as they are running. It defines a generic model of run, job, and dataset entities identified using consistent naming strategies. The core lineage model is extensible by defining specific facets to enrich those entities.In C4IR Ocean, we are dealing with data from multiple providers. Some datasets are open and publicly available, others are closed. It is therefore important to keep track of where data is coming from, and what transformations have been applied to the data after it is read from the provider.We are seeing a good landscape of data lineage solutions - both open and closed source. At the sam
Hey!In our tooling for the power-analyst we make computations using the SyntheticTimeseries API. Results from these are used in subsequent analysis, where we have identified a problem for us. When a synthetic timeseries is computed with a specified aggregate and resolution, that specification is returned irrespective of there being data on the originating timeseries. In our case, we compute aggregates over long stretches of time which results in situations like the image below: In the figure, the opaque line is the comptued by addition of the two other signals. Spanning over roughly a year, the period in the middle has a linear rise in the SyntheticTimeseries while the originals are empty. These values are of course meaningless and should be omitted in the successive steps of the analysis. Have you considered implementing a density filter or the like for these types of situations? Or, do you believe this is best solved client side by identifying the holes prior to a set of Syntheti
Hi everyone,We in are running a series where we are challenging the community to model a certain problem in CDF. The aim with this series is to facilitate discussion and invite the community members to share interesting solutions and techniques.In this second entry of the series we will focus on the SpatioTemporal Asset Catalog (STAC).The SpatioTemporal Asset Catalog (STAC) specification provides a common language to describe a range of geospatial information, so it can more easily be indexed and discovered. A 'spatiotemporal asset' is any file that represents information about the earth captured in a certain space and time.The goal is for all providers of spatiotemporal assets (Imagery, SAR, Point Clouds, Data Cubes, Full Motion Video, etc) to expose their data as SpatioTemporal Asset Catalogs (STAC), so that new code doesn't need to be written whenever a new data set or API is released.(source)In C4IR Ocean, we are primarily dealing with geospatial data. The data originates from mult
It's been a while, but as promised, here is the follow-up post to my previous article. If you haven't read the previous article, you can find it here.In the first part, I shared the basic concepts about data integration. Today I will continue on the same theme, but the focus will be on latency and frequency. They are related concepts, but not necessarily the same. Why latency and frequency? They are crucial to defining your data pipeline, and they influence which Azure resources you use to cover your needs.What is data latency and frequency? Data latency is how fast/slow data can be retrieved or stored. Low latency means that the data is available in real-time or close to real-time and is vital in use cases where you need to respond quickly to information. Examples are alerts for critical events on machinery and equipment and online games based on real-time experiences. This leads us to the data frequency concept. Data frequency is how many times in a specific period the data should b
OPC UA is an open source data standard for communication. It is one of the most used in the industry. OPC UA can for example be used to send data from a machine and its sensors to a computer (or to a cloud service such as Cognite Data Fusion).In this article, we’ll see how to extract data from an OPC UA data source to Cognite Data Fusion (CDF). For this purpose, we have an off-the-shelf extractor that we simply need to configure and deploy. This extractor is maintained by Cognite. You can find it in CDF under Integrate > Extract data (where you can also find our other extractors: DB, PI etc., you can also create custom extractors). One of the most common and easiest ways to deploy such an extractor is via a Docker image. The fact that it is platform agnostic makes it pretty easy to deploy: you just need docker installed to make it run. Also, everything you need is packaged in the image, you do not need to install or build anything. The Docker images are available on Docker Hub (htt
Make sure not to miss Cognite Application Developer Session at 4PM CEST/10AM EST May 10th! From simple multiple data source dashboards to cutting-edge hybrid AI solutions, join us for this one hour and learn how Cognite Data Fusion makes industrial application development easier.Watch recording:
This guide describes how to run the Cognite dB Extractor in a separate Docker container to fetch data from Microsoft SQL server to Cognite Data Fusion. Prerequisites Running instance of MS SQL server and valid credentials to SELECT rows from the table in question Docker host that can access the MS SQL server Running CDF project with an Azure AD service principal with capabilities to write either to CDF. 1 - Prepare Docker ImageCognite provides the database extractor as a Docker image published to Docker Hub, requiring just the addition of an ODBC driver. Since we are connecting to MS SQL we will install drivers provided by Microsoft for Debian 11, using this Dockerfile: Note! Go to https://hub.docker.com/r/cognite/db-extractor-base/tags to see the latest version of the Docker imageFROM cognite/db-extractor-base:2.5.0-beta5RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \ curl https://packages.microsoft.com/config/debian/11/prod.list > /
Hey, To what extent does CDF handle possible race-condition triggering situations like the update of an object from two different systems?The example we are currently considering is whether a disjoint set of metadata on objects (events) can be updated without regard to timing. /Robert
Hi, When parsing a large production model some high level concepts we want to filter on are structured as assets. I.e. a fiscal region for power in the Statnett case. Our model is now hitting limitations of subtree queries. With more than 100k assets per region for some regions. What is your thinking around how to handle such cases in a model? To spike the conversation we’ve considered moving high-concept parts of the tree to labels, or making more of the DB type of operations locally. However, the former demands a pipeline for moving concepts suited for a tree to a Label “just because”. The latter requires quite a lot of iron present on the local instance processing the query. An instance of the SDK quering we do can be seen in the power-SDK in github.
Currently when fetching datapoints from multiple timeseries, the Python Cognite-SDK splits up the requests into one request per timeseries, while the underlying CDF API supports up to 100 timeseries per requests. This slows down fetching multiple timeseries, which is a problem when you display multiple timeseries (or aggregated of timeseries) in a frontend application.
WhyWhen starting out in Cognite Data Fusion (CDF) project, it's natural to start by creating data governance elements like groups, datasets, and Raw databases from the CDF user interface. But as the solution begins to scale, you'll quickly realize that it is demanding to set up a detailed configuration handling multiple solutions, sources, roles, and other dimensions. For scaling, precise control is needed for access management and data governance and to enforce the guidelines and rules across the solution. The problem is not trivial, and a good way to solve it is by replacing the manual approach with a configuration-driven system, where the configuration language supports higher-level concepts for data lineage and access control. With configuration files as the foundation, you can set up an automated DevOps process and review and approve any changes to the structure before they are deployed. This approach also dramatically simplifies sharing the same configuration across multiple envi
Hey!In our tooling for Power Analysts we compute synthetic-timeseries for them to evaluate scenarios of flow exceeding a threshold value. In the current implementation of SyntheticTimeseries only Average and Interpolation aggregates are allowed. This leads to scenarios where zoomed out (and down-sampled) views of data computed through the SyntheticTimeseries API displays non-informative values. Consider the example below, where the top image is the un-aggregated addition of two timeseries, and the bottom is with the use of SyntheticTimeseries.
HiI have retrieved timeseries from an asset in PowerBi. That asset is linked through a relationship column to a group of assets. However the asset and the group it belongs to are connected trough 4 steps:groupwanted <- belongsTo - subgroup<- belongsTo - type of group <- connectsTo - group of a few timeseries assets<- belongsTo - timeseries asset Im wondering how to link the relationships in power bi, and how to orient to the right place.
Hi. I’m currently working on an application where we need to check if a node (PLC/PC) have updated any timeseries within x amounts of hours. The way we do it now is to get all timeseries recorded values in a time range, and check if any timeseries have any values. If there is an value on one timeserie, then we consider the node alive. This can be time consuming, since each node can several hundreds/thousands of timeseries. So my question is if there is a way to get the latest recorded value in a collection of timeseries within a timerange? Or just a recorded value in a timerange for a collection of timeseries. In Cognite, we have an asset hierarchy, which preferably would look like this:RigNode 1 TimeSerie1 TimeSerie2 ... Node 2 TimeSerie1 TimeSerie2 ... Node ... TimeSerie1 TimeSerie2 … But the hierarchy could also be completely flat, where all the timeseries are connected to the Rig. Our externalIds for the timeseries are:rigNumber.NodeNumber.SignalNumber So all timeseries wit