Solved

Need dump yaml for instances deployment by CDF Toolkit

2 months ago
January 30, 2025
11 replies
97 views

Khilesh Sahu
Committed
17 replies

I want to upload instances using CDF Toolkit. For example, data model I can get dump yaml from existing data model using toolkit like following:

cdf dump datamodel

I want to get the dump yaml of instances from my existing space and data model which I can use it to later deploy using toolkit to different space or env.

I checked the doc here: https://docs.cognite.com/cdf/deploy/cdf_toolkit/references/resource_library#nodes

But creating yaml file manually would be too hard. I have also tried to get dump yaml using python sdk but that dump yaml looks incomplete to be used for deployment.

Best answer by Anders Albert

Currently, Toolkit does not support dumping of instances, but I will note it as a feature request.

The workaround would be to use the PySDK to create the yaml files. The you can do something like this

retrieved = client.data_modeling.instances.retrieve(nodes=my_node_ids, edges=my_edge_ids)
        Path("my_nodes.Node.yaml").write_text(retrieved.nodes.as_write().dump_yaml(), encoding="utf-8")
        Path("my_edges.Edge.yaml").write_text(retrieved.edges.as_write().dump_yaml(), encoding="utf-8")

Notice the `.as_write()` method this converts the nodes from the response/read format to the request/write format that Toolkit needs.

View original

Did this topic help you find an answer to your question?

Khilesh Sahu
Author
Committed
17 replies
2 months ago
January 30, 2025

@Anders Albert - We were able to deploy datamodel using the approach which you suggested last time. Would you be able to provide help in this case as well. Also, please let me know what is ideal approach if I want to deploy timeseries using toolkit as well.

Anders Albert
Seasoned Practitioner
108 replies
Answer
2 months ago
January 31, 2025

Currently, Toolkit does not support dumping of instances, but I will note it as a feature request.

The workaround would be to use the PySDK to create the yaml files. The you can do something like this

retrieved = client.data_modeling.instances.retrieve(nodes=my_node_ids, edges=my_edge_ids)
        Path("my_nodes.Node.yaml").write_text(retrieved.nodes.as_write().dump_yaml(), encoding="utf-8")
        Path("my_edges.Edge.yaml").write_text(retrieved.edges.as_write().dump_yaml(), encoding="utf-8")

Notice the `.as_write()` method this converts the nodes from the response/read format to the request/write format that Toolkit needs.

Khilesh Sahu
Author
Committed
17 replies
1 month ago
February 3, 2025

Anders Albert wrote:

Currently, Toolkit does not support dumping of instances, but I will note it as a feature request.

The workaround would be to use the PySDK to create the yaml files. The you can do something like this

retrieved = client.data_modeling.instances.retrieve(nodes=my_node_ids, edges=my_edge_ids)
        Path("my_nodes.Node.yaml").write_text(retrieved.nodes.as_write().dump_yaml(), encoding="utf-8")
        Path("my_edges.Edge.yaml").write_text(retrieved.edges.as_write().dump_yaml(), encoding="utf-8")

Notice the `.as_write()` method this converts the nodes from the response/read format to the request/write format that Toolkit needs.

Thanks Anders this really helps. Also, we have lots of instances so what should be ideal approach to deploy them while we want to use toolkit in our CICD pipeline? In aspect of effectiveness and performance

Anders Albert
Seasoned Practitioner
108 replies
1 month ago
February 3, 2025

Typically, you would populate a data model either

Through an extractor
From RAW and using transformations.

It is possible to use Toolkit, but it will not necessarily be performant. Are your thinking to check in the YAML with the nodes into version control and use Toolkit to match the version controlled YAML with your CDF? Note that Toolkit is intended for governing of resources, not ingest data, although it is possible.

Khilesh Sahu
Author
Committed
17 replies
1 month ago
February 3, 2025

Anders Albert wrote:

Typically, you would populate a data model either

Through an extractor
From RAW and using transformations.

@Anders Albert

Yes we are trying to use version control where we need all these yaml files.

We have views which contain more than 1000 instances but using sdk’s client.data_modeling.instances.search I can get max 1000 instances. I dont get any continuation token to get further instances. I am using search method because here I can provide ViewId for specific view and get all instances.

We have more than 138K instances. We do deploy one time and our users further populate data base on these instances. Since, recommended approach to deploy data model is by using Toolkit therefore we are exploring possibilities to deploy our instances using toolkit as well.

So please help to get more than 1000 instances using ViewId.

Also, we would love to know your recommendation for our use case.

Khilesh Sahu
Author
Committed
17 replies
1 month ago
February 11, 2025

@Anders Albert - Could you please check this thread and give your inputs? Also confirm us, if instances deployment of instances via toolkit is a right and one of the suggested approach. We would like to get more clarity if there are any limitations with instances deployment or there are any restrictions with attributes or properties.

Anders Albert
Seasoned Practitioner
108 replies
1 month ago
February 11, 2025

We have views which contain more than 1000 instances but using sdk’s client.data_modeling.instances.search I can get max 1000 instances. I dont get any continuation token to get further instances. I am using search method because here I can provide ViewId for specific view and get all instances.

Search is for fast lookup, and is limited to 1000 instances. The client.data_modeling.instances.list will do pagination for you and thus if you pass limit=-1, will find all instances for a given view.

I would not use Toolkit for the population as you will be forced to store the instances as YAML which will be very verbose. If you are comfortable with Python I would consider a custom script. Store the data in csv or parquet, have the script load the csv/parquet, and call client.data_modeling.intances.apply(). Alternative, could be to use Pygen. That will give you client side validation in addition, but require that you generate an SDK for it. Final, option would be to upload the data into RAW and write an transformations for each type of view. That I would avoid if you can.

Anders Albert
Seasoned Practitioner
108 replies
1 month ago
February 11, 2025

Note, I am logging this as a feature request for Toolkit.

Ayush Daruka
Seasoned
25 replies
1 month ago
February 11, 2025

Anders Albert wrote:

We have views which contain more than 1000 instances but using sdk’s client.data_modeling.instances.search I can get max 1000 instances. I dont get any continuation token to get further instances. I am using search method because here I can provide ViewId for specific view and get all instances.

@Anders Albert We currently populate instances using a custom script only through a CSV file as you mentioned.
We were just looking around for a better approach to populate instances, if any, through toolkit since we are moving to toolkit for deployment going forward.
I agree that using a YAML would not make sense since it will be very verbose and large for huge amounts of instances. Regarding the final option, is there any specific reason to avoid transformations?

Anders Albert
Seasoned Practitioner
108 replies
1 month ago
February 13, 2025

@Khilesh Sahu We have release Toolkit `v0.4.7` which has alpha support for populating nodes through a view from a csv or parquet file.

You enable it in the `cdf.toml` with the following

[alpha_flags]
populate=true

The command is `cdf populate view`.

Khilesh Sahu
Author
Committed
17 replies
1 month ago
February 13, 2025

Anders Albert wrote:

@Khilesh Sahu We have release Toolkit `v0.4.7` which has alpha support for populating nodes through a view from a csv or parquet file.

You enable it in the `cdf.toml` with the following

[alpha_flags]
populate=true

The command is `cdf populate view`.

Is there any documentation available for it where I can see details like what are required steps before ‘cdf populate view’ and where to place csv files.

Need details like expected population rate and maximum num of instances we can populate.

Please share if there is any documentation available and expected release date so that we can analyze it to deploy in prod

Reply

Cookie Policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos

Reply

Related topics

Klantvoordeel

klantvoordeelicon

Klantvoordeel en korting?icon

klantvoordeel Thuis verkeerd op factuur na vernieuwen mobiel abonnementicon

Tenaamstelling abonnement icm Klantvoordeelicon

Sign up

Log in to the community

Scanning file for viruses.

This file cannot be downloaded

Cookie Policy

Cookie settings