Skip to main content

Cognite Replicator

  • October 4, 2022
  • 7 replies
  • 218 views

Pierre  Pernot
Seasoned Practitioner

A common setup that we have at Cognite is having 3 different environments: development, test and production. We have one specific Cognite Data Fusion (CDF) project for each of these environments. This setup allows to develop new features without interfering with production data. Also, it makes the deployment of new features to production much safer, since they are tested before.

Sometimes, you still want to have the data you used during your tests, in your production environment. For example, if you extracted some data from sensors and saved them as time series in CDF, you may want to keep this historical data when you move to production. Same goes with files, assets, datasets, functions etc. Another example could be the replication of the data from a full production CDF project, to one of your customer’s project, containing only a subset of the source one.

At Cognite, we have a python package (available at https://pypi.org/project/cognite-replicator/) whose purpose is to copy the selected resources from one project to another. It is quite easy to use: you only have to fill a config file with some information about authentication, the resources you want to replicate and some additional features. Then it is run thanks to a CI/CD pipeline. The time it takes depends on the data you replicate, but it is generally less than a minute. Having this package makes the lead time between development and production much faster, and allows to continuously add new features to your production environment, at scale.

This package is still getting updated with some requests from customers: if you would like to see a new feature in it, please let us know!

@Gaetan Helness, you also worked with the replicator package: any tips or best practices you would like to share ?

7 replies

Ben Brandt
Seasoned
  • Seasoned
  • October 4, 2022

The Cognite Replicator has worked great for us for refreshing our Dev and QA environments from Prod.  The only gap we found was the annotations that link assets to 3d models or images was missing, but I have not verified if this is still an issue.

For any bugs we find or suggestions, we are able to submit them as an issue on the GitHub repo like this:
Add Support For Mapping of Annotations · Issue #174 · cognitedata/cognite-replicator (github.com)


Pierre  Pernot
Seasoned Practitioner
  • Author
  • Seasoned Practitioner
  • October 5, 2022

Thank you @Ben Brandt for that clarification !


APSHANKAR Sagar
Committed

Where do you run the Cognite Replicator? From the description in the link, I have the impression that it is running on some third computer/EC2 instance, separate from CDF. 

 

Is there anyone who is using it from inside CDF itself, perhaps as a CDF function?


Forum|alt.badge.img+2
  • Active
  • May 30, 2024

I tried to run cognite replicator script through CDF function, but it is neither printing any error nor giving any output. What exactly happening at the end of CDF function not getting. 

It will be good to run it within cognite function to monitor and keep track. 

Can anyone help here? 


Forum|alt.badge.img+1

Please let me know whether this Replicator can be used to replicate the Production CDF data to other data storage like S3 bucket or Databricks. Will this replicator scale to copy those data in near real time and provide the fault tolerance performance


APSHANKAR Sagar
Committed

Please let me know whether this Replicator can be used to replicate the Production CDF data to other data storage like S3 bucket or Databricks. Will this replicator scale to copy those data in near real time and provide the fault tolerance performance

Hello Sendil,

AFAIK, no, this isn’t for that. It simply allows to copy CDF data objects from one project to another.


Forum|alt.badge.img+1

Ok thanks for the response Apshankar. I felt the same when i looked this code is not scalable for Production replication. We need to use this approach to replicate the data from CDF Data model to the Client S3 bucket