Impact 2024: The Industrial Data and AI Conference for and by Users | Nominate Speakers Now for a Ch...
If you have a sandbox that you would like to populate with data to get started with Cognite Data Fusion, you can use Open Industrial Data. We have built a Cognite replicator for that purpose. Disclaimer: This component is Community content and not officially supported by Cognite. Bugs and changes will be fixed on a best effort basis. Feel free to open issues and pull requests, we will review them as soon as we can. How to use the Cognite replicator? You have detailed instructions on Github. You have information on how to run it as python package, docker container or a standalone script and authentication via user impersonation and client credentials. How do I authenticate to my source and destination project? What permissions do I need? You have 2 ways of authenticating to your source and your destination:via user impersonation. the script will be run on behalf of your user, you should therefore have the required permissions (read/write on the resources you want to write to/from + gro
When a config is not correct for an extractor, the errors are not always helpful (even in the full logs). Also, catching the formatting errors in the UI directly would be nice
Enable monitoring of Cognite Functions, like extraction pipelines and transformations, being able to alert on failure by sending an email
One should be able to delete RAW rows using Transformations or via the python sdk for transformations, based on key
Like for extractor pipelines, there should be a documentation field for each function in Fusion. This would make it easier for our customers to use our product and built solutions, and follows a good principle of keeping the documentation of a solution as close as possible to the solution itself.
I want to search for the exact external id or name in CDF Data Explorer. Many CDF resources in my project with similar names/prefixes, so the search is often difficult and returns no results.As a data engineer, I usually have the exact tag of the resource I want, and I do not necessarily want to explore the data. Without this feature, I need to use the SDK/API/Transformations. Or being able to search for a suffix and not a prefix should be made easier
We used to have the precise number of resources on the tabs before but now if it's > 1000, i'll just display 1k+. The precise counts were very useful when testing and monitoring extraction or ingestion of data.
When writing a transformation, the logic is slowly built up. Sometimes, you join different tables, and have different sub queries. Each subquery needs a correct logic, so a Data Engineer often spends time identifying that logic is correct. Other times, you want to confirm the behaviour of data in CDF, so you use transformations to profile some specific data.Since we do not have views/temporary tables, you often need to create a database in CDF (a temporary one), get permission to write to this database (hard to do in production), create a table, then run your transformation so it writes to this table, and then write another table to query your previous temporary table. If your original query was wrong/you want to test something else, you clear the table and start over.it would be great to build a functionality to help with this iterative debugging process
If the transformation fails/there was a bug, and it did not process some rows correctly, I often need to do a backfill. I would like an easier option to run the transformation on the full data instead of changing the variable name/manually updating the transformation. Being able to see the value of the is_new flag and also being able to reset it would be great
When extractors are deployed on a customer's infrastructure, we do not have direct to the logs. If the customer is in a different timezone, that can lead to long waiting times. In CDF we have some information about the extractors's failures, but that is often not enough. Having the full logs (maybe only for the last 3 days) would be a big help.
The same way as we have write protection on a dataset, I think it would be useful to be able to have read-protection, ie only read access on some resources on specific datasets for specific groupsAt the moment, if I want to restrict access to a specific dataset, I need to add all the others into the groups I want to scope access to. Similarly, we can give access to specific resources, but we cannot NOT give access to others, unless we want specifically add all the others into the access management. If we had at least this capability through datasets, it would help And being able to scope access on a root asset would also be helpful
It could be a nice feature to have the possibility to save the output of calculations in Charts as time-series/synthetic time-series. Probably on the roadmap, but our team would like to know more about it
We would like to have a possibility to upload custom models in Charts for using as a node.cc @roman.chesnokov
Transformations are used quite extensively by Data Engineers. I feel more confident about my Python code as I can unit test. Then, if I make changes to my scripts, I have more guarantees that it will not cause changes in the data. There is no way to easily see the results of a code change in Transformations without having some other script/Transformation to investigate the changed data. This can result in many data bugs, especially if you change code that was written by someone else. It would be helpful to explore a functionality towards enabling the data quality of the transformations we write every day.
If you ever wondered what each of the different professions in Cognite do, Generative AI has the answer :)
When creating a new project, only the admin group is createdIt would be nice to also automatically create some additional groupsread-write-all read-only Also the ability to duplicate a group would be useful
Already have an account? Login
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.
Sorry, we're still checking this file's contents to make sure it's safe to download. Please try again in a few minutes.
Sorry, our virus scanner detected that this file isn't safe to download.