General feedback (v0.1.1)

2 months ago
5 March 2024
9 replies
90 views

Marius
Committed
4 replies

Great work on the tool, it will let us remove a few pounds of CICD scripts.

Listing some feedback here (as advised by Jan Inge).

Will you supply patterns or guidance for structuring the custom modules. Do you have any thoughts on which level a project should group their artifacts into a module.

Is there a plan for added support for the development lifecycle, e.g. downloading a full solution as configuration (from existing project), download buttons on other resources than transformation (e.g. datasets).

It would perhaps be better to let the RAW config support multiple tables (one per DB), one file per table seems a bit bloated. A “Download CLI” on the DB level would also be great.

Is it possible to have the auth verify command also check transformation credentials?

Not directly related to the tool, but it seems to be a mismatch between tranformation credentials configuration in the UI and via the tool. We cannot configure the advanced scenario in the UI, and we cannot verify the auth via the tool. This means that we cannot verify the transformation credentials before we have done a deployment.

The transformation schedule active flag isn't respected on deployment.

It is an option to put the query in the transformation yaml - but if you do, the build step gives a warning of a missing sql file. Also, the file with the query should perhaps have the 'sparql' or 'rq' extension, since it isn't ANSI SQL? This causes some editors to show that these files contains errors.

9 replies

Userlevel 1

Anders Albert
Practitioner
12 replies
1 month ago
5 March 2024

Great to hear you enjoy the tool and has taken the time to write up your feedback.

Will you supply patterns or guidance for structuring the custom modules. Do you have any thoughts on which level a project should group their artifacts into a module.

Yes this documentation is on the roadmap. For now, we recommend that you gather resources into a data pipeline, for example, groups, transformation, datasets, data models related to ingesting data from a source into CDF.

Is there a plan for added support for the development lifecycle, e.g. downloading a full solution as configuration (from existing project), download buttons on other resources than transformation (e.g. datasets).

Yes, we have plans for supporting lifecycle. Currently, exactly how that implementation will be and the user interface for it is not yet decided.

It would perhaps be better to let the RAW config support multiple tables (one per DB), one file per table seems a bit bloated. A “Download CLI” on the DB level would also be great.

This should already be supported. All resources could either be given as list or single objects in their config files. If this is not the case, then it is a bug.

Is it possible to have the auth verify command also check transformation credentials?

Good suggestion, added to the backlog, which means it will be considered for a future release.

Not directly related to the tool, but it seems to be a mismatch between tranformation credentials configuration in the UI and via the tool. We cannot configure the advanced scenario in the UI, and we cannot verify the auth via the tool. This means that we cannot verify the transformation credentials before we have done a deployment.

Good suggestion, added to the backlog, which means it will be considered for a future release.

The transformation schedule active flag isn't respected on deployment.

If that is true, it is a bug. But I do not understand which parameter you are pointing to? The toolkit is following the API spec. Do you mean isPaused?. https://api-docs.cognite.com/20230101/tag/Transformations/operation/createTransformations.

It is an option to put the query in the transformation yaml - but if you do, the build step gives a warning of a missing sql file. Also, the file with the query should perhaps have the 'sparql' or 'rq' extension, since it isn't ANSI SQL? This causes some editors to show that these files contains errors.

Good suggestion, added to the backlog, which means it will be considered for a future release.

Marius
Author
Committed
4 replies
1 month ago
5 March 2024

Thanks for the update.

When it comes to multiple tables for raw, this is from the docs:
RAW configurations can be found in the module's raw/ directory. You need a yaml file per table you want to load.

I cannot see a way to add multiple tables according to the docs. The API also has DBs and tables as separate actions, so that may be the reason.

As for the active flag for transformation schedules, it is correct that I mean the isPaused attribute. If we change it and redeploy it doesn’t seem to affect the target state.

Also, is it possible to have a shared config (for all environments)? I cannot see if it is described in the docs. This will be used for values that are shared across the environments, e.g. tenant ID/URL.

Lastly, is it possible to have a “init new project” with just a bare minimum of needed artifacts? This way we do not need to clean it up / remove things after.

Marius
Author
Committed
4 replies
1 month ago
2 April 2024

Note that when it comes to transformation schedules, the documentation at https://developer.cognite.com/sdks/toolkit/references/configs#transformations-dir-transformations seem to be wrong. It points to an indented schedule config in the transformation yaml. However, the examples that follow the last version puts the schedules in a separate .schedule.yaml file. The separate file works. The documented solution does not.

Userlevel 1

Anders Albert
Practitioner
12 replies
1 month ago
3 April 2024

Thanks Marius for all the feedback. I have logged it, and we will improve the tool+docs based on it.

And sorry for not answering your questions from the last post:

I cannot see a way to add multiple tables according to the docs. The API also has DBs and tables as separate actions, so that may be the reason.

This is a now a task to update the docs.

As for the active flag for transformation schedules, it is correct that I mean the isPaused attribute. If we change it and redeploy it doesn’t seem to affect the target state.

Will look into this

Also, is it possible to have a shared config (for all environments)? I cannot see if it is described in the docs. This will be used for values that are shared across the environments, e.g. tenant ID/URL.

You could in an older alpha version, so likely the docs are outdated. We removed it as it easily leads to increased complexity compared to having configs separate for each environment.

Lastly, is it possible to have a “init new project” with just a bare minimum of needed artifacts? This way we do not need to clean it up / remove things after.

This we are currently discussing. It might be a new feature in the future.

Userlevel 1

Anders Albert
Practitioner
12 replies
9 days ago
25 April 2024

I have looked into the issue of isPaused:

As for the active flag for transformation schedules, it is correct that I mean the isPaused attribute. If we change it and redeploy it doesn’t seem to affect the target state.

I cannot recreate it. I have deployed and flipped it and gotten the desired result in Fusion.

Could it be that you changed it in the source map, and did not run cdf-tk build? A mistake I have done many times, which you will be warned about in future versions of the tookit.

Another mistake I did when looking into it was interpreting isPaused: true to mean the transformation schedule should run, which is wrong, running the transformation is isPaused: false.

Marius
Author
Committed
4 replies
8 days ago
26 April 2024

If you see my last comment I have some updates to this (since the quoted issue). Not sure if you use a separate .schedule file or not?

Userlevel 1

Anders Albert
Practitioner
12 replies
8 days ago
26 April 2024

Sorry, I missed that. You are correct the docs are outdated. Will add a task to follow up on that.

Userlevel 2

palronning
Architect
3 replies
1 day ago
3 May 2024

@Marius just circling back to this:

Also, the file with the query should perhaps have the 'sparql' or 'rq' extension, since it isn't ANSI SQL? This causes some editors to show that these files contains errors.

From what I can gather, both sparql and rq belong to the data modeling domain, not Apache Spark-type SQL, are you sure that’s what you meant?

Marius
Author
Committed
4 replies
1 day ago
3 May 2024

No, and I see the sentence is bad. It was just a wish for a dedicated extension for the Spark SQL dialect. I see now that the community has landed on just .sql and the problems it causes in editors and IDEs. In short, all our transformation queries are full of red squiggles in VS Code.

Reply

Sign up

Log in to the community

Scanning file for viruses.

This file cannot be downloaded