Cognite Data Fusion (CDF) provides a web-based user interface for managing your data and configurations. Here's how you can manage projects, transformations, and datasets:
1. Projects
- What are Projects and Organizations? In CDF, a project is a logical isolation of data. All your data (assets, time series, files, data models, instances, data sets, etc.) resides within a specific project. Organizations can have multiple projects. Organizations and Projects allow us to logically separate data storage and compute within a cluster (cloud environment where CDF is running, either in Azure, AWS or GCP).
- How to Manage Projects:
- Projects are typically set up during your Cognite Data Fusion subscription and initial configuration. You can request additional projects from Cognite support.
- The Cognite Data Fusion UI is primarily used to work within a project, rather than manage projects themselves. You can use the UI to switch between existing projects.
- Project-level settings, like which cloud provider is used, are also configured during the setup phase.
- The names of the projects have to be globally unique and cannot be changed or moved between different cloud providers.
- Access Management: User access to projects is managed through your organization's Identity Provider (IdP) and CDF groups. When users sign in, they see a list of the projects they can access.
For detailed information on how users are authenticated and authorized to access projects, you should refer to the official Cognite documentation on Access Management
To log in to your CDF project:
Navigate to fusion.cognite.com and enter your organization name.
Select a project in the organization
Once you are logged in into your project, you can navigate and switch projects at your convenience using the left navigation menu
2. Transformations
- What are Transformations? CDF Transformations allow you to process and transform data within CDF using SQL. You can use them to clean, structure, and enrich your data.
They can be created through the UI, SDK, API or CDF toolkit
- How to Manage Transformations:
- Creating Transformations:
- In the CDF UI, navigate to "Data Management" > "Integrate" > "Transformations".
- Click "Create transformation" and provide a name, external ID, and optional data set.
- Select the target data model, for example Cognite Core Data Model
- Fill the rest of the necessary information (target space, target type, action and data model version). Click Create
- You then enter the transformation(mapping) editor/SQL editor. You can switch between on the top of the screen.
- Write our code for the transformation and click Preview to see the results
- If the Preview looks good, we can run the transformation.
- In the CDF UI, navigate to "Data Management" > "Integrate" > "Transformations".
- Creating Transformations:
Once you are satisfied with the code of the transformation, it is recommended to store the code in your CDF toolkit configuration in Github to source control it.
You can then use the CDF toolkit to deploy your transformation code to this project or any other project (for example development/production).
- Running Transformations:
- Select a transformation and click "Run".
- You can choose to run it with client credentials or as the current user.
- Scheduling Transformations:
- You can schedule transformations to run automatically at regular intervals.
- Configuration options for schedules are available in the transformation settings
- Monitoring Transformations:
- The UI provides tools to monitor the status of transformation runs, view logs, and troubleshoot issues.
- The UI provides tools to monitor the status of transformation runs, view logs, and troubleshoot issues.
- Key aspects of managing Transformations:
- Defining the target data model.
- Mapping source data to the target schema.
- Writing SQL queries for complex transformations.
- Setting up schedules for recurring execution.
- Managing credentials and authentication.
For more comprehensive information, see the Cognite documentation on Transform Data.
3. Datasets
- What are Datasets? Data sets let you document and track data lineage, ensure data integrity, and allow 3rd parties to write their insights securely back to a Cognite Data Fusion (CDF) project.
- Data sets group and track data by its source. For example, a data set can contain all work orders originating from SAP. Typically, an organization will have one data set for each of its data ingestion pipelines in CDF.
- A data set consists of metadata about the data set, and the data objects that belong to the data set. Data objects, for example events, files, and time series, are added to a data set through the dataSetId field of the data object. Each data object can belong to only one data set.
They can be created through the UI, SDK, API or CDF toolkit
- How to Manage Datasets:
- Creating Datasets:
- In the CDF UI, go to "Data Management" > "Data Catalog".
- Click "Create a data set" and provide the necessary information (name, description, external id, labels, write-protection).
- In the CDF UI, go to "Data Management" > "Data Catalog".
- Editing Datasets:
- Select a dataset from the Data Catalog and click "Edit".
- You can modify dataset metadata (name, description, external id), set labels, define the governance status, and write-protect the dataset.
You can also document data extraction, document data transformations and add documentation.
- Archiving Datasets:
- You can archive a dataset through the UI. There is also an option to view archived data sets and to restore it. It is not possible to delete a data set
- You can archive a dataset through the UI. There is also an option to view archived data sets and to restore it. It is not possible to delete a data set
- Exploring Datasets:
- In the Data Catalog, you can view a list of your datasets and their details.
- The UI allows you to explore the data within a dataset, including the different resource types (assets, time series, etc.) and their lineage.
- Data Governance:
- You can set the governance status of a dataset (governed or ungoverned) to indicate its quality and reliability.
- Governed data sets have a designated owner and follow the governance processes for data in your organization.
- Ungoverned data sets indicate to users that the data may not be reliable. If you want to use data from an ungoverned data set, we recommend contacting the owner or creator of the data set to learn more about the data.
- You can set the governance status of a dataset (governed or ungoverned) to indicate its quality and reliability.
- Access Control:
- To manage access to datasets, you need to set the appropriate capabilities for CDF groups. This ensures that only authorized users and applications can access or modify the data.
- Key aspects of managing data sets:
- Documenting data sources and lineage.
- Setting governance status.
- Write-protecting critical data.
- Controlling access to data.
- Creating Datasets:
For more details, refer to the Cognite documentation on Data Sets.
Check the
documentation
Ask the
Community
Take a look
at
Academy
Cognite
Status
Page
Contact
Cognite Support