Filtering, Sorting and Searching RAW tables

Related products: Transformations and RAW

The RAW Layer or Staging area, could do with the ability to explore the data on CDF itself.

Currently, the only way to analyze any RAW table is to download it as CSV and to filter, sort or search within Excel.

Also, at the moment, you can only filter or search at column level and not at row level.

Some Suggestions:

  • Ability to sort and filter rows (Similar to Excel).
  • Search across the entire RAW table using particular keywords.
  • Maybe even an SQL editor to write any queries on the RAW tables.
  • Instead of having Generative AI capabilities on Jupyter notebook, why not allow the user to type question directly on the Raw table screen?

This is essential as it is typically used to perform sanity testing or reconcile the RAW layer with the data (resource types) in CDF. The client was heavily reliant on RAW table analysis to check that the transformation logic is correct and that counts match in the source and CDF.

Hi and thank you for reaching out.

Those are all good suggestions. Although we have no fixes for those particular issues, note that we are adding a feature that allows you to generate a Jupyter Notebook based from a RAW table that allows you to work with the data in code to do analysis. This feature will be available in the March release (due 5th of March).

Also, note that we do provide ways of working with RAW data in SQL. From the Transform data UI (available in Cognite Data Fusion under Data Management → Transform data) you can work with the data in SQL.

Hope this helps!


@Lars Moastuen 

Thank you for your prompt reply.

I am aware of transformations and Juypter notebooks. 

The issue is that transformations are not meant to be used for exploring / analysing RAW data and also the user might not have permissions to add/modify/duplicate transformations. Additionally:

  • Transformations only return a maximum of 1000 rows.
  • User might not be proficient with SQL
  • You do not want to give access to transformations to someone who is simply trying to analyze one of the RAW tables. You run the risk of them applying bad transformations that corrupt the data in CDF..
  • You end up with many “Test” or “dummy”  transformations because someone was just running queries on Raw tables.

I think that all this functionality should sit in RAW explorer as that's what the name suggests - The ability to explore  data in the raw layer,