Tutorial

Embark on your data science journey with Jupyter Notebooks in Fusion

Related products: Jupyter Notebooks
Embark on your data science journey with Jupyter Notebooks in Fusion

Jupyter is a tool loved by data scientists, analysts, and engineers, particularly for its interactive computing environment and robust data processing capabilities. This platform allows users to dynamically write and execute code, visualize results in real-time, and efficiently handle large datasets, making it an essential tool for in-depth data analysis and exploration. Using Jupyter Notebooks in Fusion, you can now write and run Jupyter Notebooks in Fusion without leaving your browser. Notebooks can be stored in Cognite Data Fusion (CDF) allowing easy access and sharing. All code is executed in a “sandbox” within your browser, reducing the chance of leaking information. In this sandbox, notebooks do not have access to your local files or information stored on your computer. All communication with CDF and notebook code is executed under the current user credentials, so Jupyter cannot be used to gain access to restricted data stored in CDF.

 

Paired with our new GenAI code copilot, we have made it easier than ever to get productive with CDF. The copilot will help you write and explain code. Soon, the copilot will also help you fix code errors.

If you are new to Jupyter, we recommend first reading the official JupyterLab introduction guide before reading the rest of this guide.

Join the Beta: Your Feedback Matters

Jupyter Notebooks in Fusion is in public beta. As with any beta program, some level of instability and unforeseen issues may be expected. We strongly encourage users to actively participate in this beta phase by providing feedback, which is invaluable to us. For product interaction and reporting bugs, please visit Cognite Hub. If you have ideas for improvements you would like to share, please create a new product idea on Cognite Hub. Your contributions are not only welcome but essential in making JupyterLab in Fusion a more robust and user-friendly tool.

Using Jupyter Notebooks in Fusion

This chapter guides you through the process of using Jupyter Notebooks within the Cognite Fusion environment, detailing how to access, store, and manage your notebooks. You will also learn how to authenticate with CDF to access information stored there and how the new copilot can help you accelerate the process of writing code.

Jupyter Notebooks is available under the Explore menu in Cognite Fusion. Once started, previous Jupyter Notebooks users will recognize the familiar user interface. Notebooks and files are accessible in the file browser. 

Storage

The first time you open Jupyter Notebooks, you will notice a few folders already available:

  • examples/ - This folder contains some interesting examples of things you can do with JupyterLab in Fusion as an inspiration.
  • quickstart/ - A quickstart guide for getting started with data modeling and population of data.
  • CDF ☁/ - A special folder for loading and storing notebooks to Cognite Data Fusion.

Eq2e9fH38gg7fzpXRKwOZZzcFF16oqo6D-tLAUWi_OaUy3Vk3t6V-aZ4bWE2rEKlLOZ4eIA2A_90VcP7vmY3VmtJlBHkBT2biIDIOmBiXGwH5Dd9eCyV1g0z69JMnPt8mZ8qjaRoe5TUIF1Dkgd6KwY

 

There are two different ways of storing notebooks and data in Jupyter Notebooks in Fusion; locally or in Cognite Data Fusion (CDF). By default, all files and notebooks are stored within the browser storage. Since this storage can be cleared (e.g. as a result of clearing the browser cache) it is recommended to only use this storage for small notebooks that temporary files you can afford to lose. Locally stored notebooks are accessible across projects - that is, you will have access to notebooks from CDF project A when you are working in project B.

To store persistent notebooks and share work with your colleagues, use the CDF-folder. Inside this folder, you will find each data set you have read-write access to. If you already have a data set you want to store notebooks in - great! If not, see section below for creating data sets for notebooks. To save a notebook to Cognite Data Fusion, simply place it inside the respective folder in the file explorer in Jupyter. Note that only notebook files with IPYNB-extension will be saved to CDF - no other files will be synchronized.

Authentication helpers

When using Jupyter Notebooks in Fusion, it’s recommended to use the default authentication helpers provided. To use this helper, simply add the following code to your notebook:

from cognite.client import CogniteClient

client = CogniteClient()

This authenticates using the current user credentials, meaning that any code that is run in Jupyter Notebooks in Fusion when using the authentication helper has the same access permissions as the user.

GenAI code assistant

Jupyter Notebooks in Fusion comes with a useful coding assistant. This assistant will help you write code based on natural language input and explain existing code. Later it will also be able to help you fix code errors.

FlzodY1qlbUPj56J15bDWqTdYtq4fRbKEfivYZS3ltVDTuKU-jhXmm1CCrheVppR7fIDzC3AfzIwNuxyeHvej1ST6v-pyo9z6Yfu8LPWG-9a8-IwcrYY-QkssoH2ghH33ZWjEEJIEDJ9uJPQmd64xl4

The code copilot is accessible from the cell editor in Jupyter. You can use the copilot to generate and explain code. Later we will also add functionality to help you fix code with errors.

Best practices for succeeding with the code copilot includes:

  • Be explicit. If you want the copilot to work on data you retrieved in previous cells, state the name of those variables. You also might want to hint as to what type of those variables are, especially if it's not obvious from the code in prior cells.
  • Use Cognite Data Fusion lingo to help the copilot understand what APIs are relevant. An example of this is that if you are retrieving work order data, provide information on how to retrieve this data - not just instruct the copilot to retrieve work orders.
  • Instruct the copilot to do one single task per cell. For performing additional tasks, add additional code cells. Manually combine code from multiple cells after verifying that the code works.
  • Write instructions in English. Other languages might work, but expect best accuracy when using English.
  • Always read through the generated code before running to make sure the code doesn’t perform harmful actions. This is especially important if you have elevated access rights to CDF to ensure data integrity.
  • Expect to be required to change parts of the generated code. The copilot will make mistakes, but can help you point in the right direction.

Some examples of prompts you might want to try out is:

  • Initialize a Cognite client
  • Retrieve all root assets
  • Search for documents by "Operating procedure" and print content of the 5 first documents
  • Fetch time series value for the last year for time series with external ID "XYZ". Plot the time series and print min, max and mean values.

Note that this is an early version of the code assistant. Expect accuracy to improve as we develop our coding assistant over the coming months.

GenAI data insights

We also introduce a new Python package, cognite-ai, that will help you use GenAI not only as a coding assistant, but also to simplify data insights in your code. This package is based on PandasAI and allows you to “talk to your data frames” from CDF An example notebook is available under the examples/-folder.

Note that this Python package also can be used outside Jupyter Notebooks in Fusion. This package is in an experimental state.

Security considerations

When you are using the default authentication helper, all code executed in Jupyter Notebooks in Fusion is run under the current user credentials meaning Jupyter Notebooks will not provide access to any data the user doesn’t already have access to.

Notebooks are stored in CDF and/or browser storage. Local cached versions of notebooks are kept in browser storage. It’s advisable to clear notebook cell outputs if sensitive information is listed here.

Never store secrets in notebooks, even for private notebooks. 

Limitations and known issues

Known issues:

  • Note that due to restrictions in the underlying technology, you will only be able to have one browser tab/window with Jupyter Notebooks open at the same time.
  • The link generated from “Copy sharable link” file context menu does not work
  • Packages with native libraries are not supported. When you see an error message like this:
    Can't find a pure Python 3 wheel for ‘<package>’. See: https://pyodide.org/en/stable/usage/faq.html#micropip-can-t-find-a-pure-python-wheel. You can use `micropip.install(..., keep_going=True)`to get a list of all packages with missing wheels.
    it typically means that the package requires a native library and cannot be used with Pyodide. See Pyodide document for more information.

  • It’s not possible to interrupt execution of notebook cells.

Advanced: preparing datasets for sharing notebooks

In order to store notebooks in Cognite Data Fusion (CDF), you must have access to reading and writing files to a Data set. To share notebooks within your organization all collaborators must have file:read and file:write ACLs. Access to managing Data sets is usually restricted, and you might need a CDF administrator to help you prepare Data set(s) and manage access.

 

Be the first to reply!