General feedback (v0.1.1)

  • 5 March 2024
  • 9 replies
  • 90 views

Great work on the tool, it will let us remove a few pounds of CICD scripts.

Listing some feedback here (as advised by Jan Inge).

Will you supply patterns or guidance for structuring the custom modules. Do you have any thoughts on which level a project should group their artifacts into a module.

Is there a plan for added support for the development lifecycle, e.g. downloading a full solution as configuration (from existing project), download buttons on other resources than transformation (e.g. datasets).

It would perhaps be better to let the RAW config support multiple tables (one per DB), one file per table seems a bit bloated. A “Download CLI” on the DB level would also be great.

Is it possible to have the auth verify command also check transformation credentials?

Not directly related to the tool, but it seems to be a mismatch between tranformation credentials configuration in the UI and via the tool. We cannot configure the advanced scenario in the UI, and we cannot verify the auth via the tool. This means that we cannot verify the transformation credentials before we have done a deployment.

The transformation schedule active flag isn't respected on deployment.

It is an option to put the query in the transformation yaml - but if you do, the build step gives a warning of a missing sql file. Also, the file with the query should perhaps have the 'sparql' or 'rq' extension, since it isn't ANSI SQL? This causes some editors to show that these files contains errors.

 


9 replies

Userlevel 1
Badge

Great to hear you enjoy the tool and has taken the time to write up your feedback. 

Will you supply patterns or guidance for structuring the custom modules. Do you have any thoughts on which level a project should group their artifacts into a module.

Yes this documentation is on the roadmap. For now, we recommend that you gather resources into a data pipeline, for example, groups, transformation, datasets, data models related to ingesting data from a source into CDF.

Is there a plan for added support for the development lifecycle, e.g. downloading a full solution as configuration (from existing project), download buttons on other resources than transformation (e.g. datasets).

Yes, we have plans for supporting lifecycle. Currently, exactly how that implementation will be and the user interface for it is not yet decided. 

It would perhaps be better to let the RAW config support multiple tables (one per DB), one file per table seems a bit bloated. A “Download CLI” on the DB level would also be great.

This should already be supported. All resources could either be given as list or single objects in their config files. If this is not the case, then it is a bug.

Is it possible to have the auth verify command also check transformation credentials?

Good suggestion, added to the backlog, which means it will be considered for a future release. 

Not directly related to the tool, but it seems to be a mismatch between tranformation credentials configuration in the UI and via the tool. We cannot configure the advanced scenario in the UI, and we cannot verify the auth via the tool. This means that we cannot verify the transformation credentials before we have done a deployment.

Good suggestion, added to the backlog, which means it will be considered for a future release. 

The transformation schedule active flag isn't respected on deployment.

If that is true, it is a bug. But I do not understand which parameter you are pointing to? The toolkit is following the API spec. Do you mean isPaused?. https://api-docs.cognite.com/20230101/tag/Transformations/operation/createTransformations

It is an option to put the query in the transformation yaml - but if you do, the build step gives a warning of a missing sql file. Also, the file with the query should perhaps have the 'sparql' or 'rq' extension, since it isn't ANSI SQL? This causes some editors to show that these files contains errors.

Good suggestion, added to the backlog, which means it will be considered for a future release. 

Thanks for the update.

When it comes to multiple tables for raw, this is from the docs:
RAW configurations can be found in the module's raw/ directory. You need a yaml file per table you want to load.

I cannot see a way to add multiple tables according to the docs. The API also has DBs and tables as separate actions, so that may be the reason.

As for the active flag for transformation schedules, it is correct that I mean the isPaused attribute. If we change it and redeploy it doesn’t seem to affect the target state.

Also, is it possible to have a shared config (for all environments)? I cannot see if it is described in the docs. This will be used for values that are shared across the environments, e.g. tenant ID/URL.

Lastly, is it possible to have a “init new project” with just a bare minimum of needed artifacts? This way we do not need to clean it up / remove things after.

Note that when it comes to transformation schedules, the documentation at https://developer.cognite.com/sdks/toolkit/references/configs#transformations-dir-transformations seem to be wrong. It points to an indented schedule config in the transformation yaml. However, the examples that follow the last version puts the schedules in a separate .schedule.yaml file. The separate file works. The documented solution does not.

Userlevel 1
Badge

Thanks Marius for all the feedback. I have logged it, and we will improve the tool+docs based on it. 

And sorry for not answering your questions from the last post:
 

I cannot see a way to add multiple tables according to the docs. The API also has DBs and tables as separate actions, so that may be the reason.

This is a now a task to update the docs.

As for the active flag for transformation schedules, it is correct that I mean the isPaused attribute. If we change it and redeploy it doesn’t seem to affect the target state.

Will look into this

Also, is it possible to have a shared config (for all environments)? I cannot see if it is described in the docs. This will be used for values that are shared across the environments, e.g. tenant ID/URL.

You could in an older alpha version, so likely the docs are outdated. We removed it as it easily leads to increased complexity compared to having configs separate for each environment.

Lastly, is it possible to have a “init new project” with just a bare minimum of needed artifacts? This way we do not need to clean it up / remove things after.

This we are currently discussing. It might be a new feature in the future.

Userlevel 1
Badge

I have looked into the issue of isPaused:

As for the active flag for transformation schedules, it is correct that I mean the isPaused attribute. If we change it and redeploy it doesn’t seem to affect the target state.

I cannot recreate it. I have deployed and flipped it and gotten the desired result in Fusion. 

Could it be that you changed it in the source map, and did not run cdf-tk build? A mistake I have done many times, which you will be warned about in future versions of the tookit. 

Another mistake I did when looking into it was interpreting isPaused: true to mean the transformation schedule should run, which is wrong, running the transformation is isPaused: false.

If you see my last comment I have some updates to this (since the quoted issue). Not sure if you use a separate .schedule file or not?

Userlevel 1
Badge

Sorry, I missed that. You are correct the docs are outdated. Will add a task to follow up on that.

Userlevel 2

@Marius just circling back to this:  

 

Also, the file with the query should perhaps have the 'sparql' or 'rq' extension, since it isn't ANSI SQL? This causes some editors to show that these files contains errors.

 

From what I can gather, both sparql and rq belong to the data modeling domain, not Apache Spark-type SQL, are you sure that’s what you meant? 

No, and I see the sentence is bad. It was just a wish for a dedicated extension for the Spark SQL dialect. I see now  that the community has landed on just .sql and the problems it causes in editors and IDEs. In short, all our transformation queries are full of red squiggles in VS Code.

Reply