Skip to main content
Gathering Interest

Auto-Reconciliation for Cognite DB Extractor

Related products:Extractors
  • Øystein Aspøy
    Øystein Aspøy
  • Markus Pettersen
  • RyanGuillory
  • bmcclure
  • JScheiderer
  • Chris Jackson
  • jjschumaker
  • John Cassidy
  • BeauSlaughter
  • rrosebeck
  • brianjwagoner

 

We request the introduction of an auto-reconciliation feature for the Cognite DB Extractor. Currently, we face significant issues with data synchronization between our source systems and Cognite. When physical deletes occur in the source tables, these deletions are not reflected in Cognite, leading to outdated or incorrect data residing in our Cognite environment. This discrepancy is resulting in data integrity issues and misinformed decision-making.

To address this, we propose a feature that automatically detects and reconciles physical deletes from the source system, ensuring that these changes are promptly reflected in Cognite. This enhancement should operate with minimal latency, ideally processing updates within minutes, to ensure that our data remains accurate and reliable. Implementing this feature will streamline our data management processes and maintain consistency across systems.

 

4 replies

Jørgen Lund
Seasoned Practitioner
  • Product Manager
  • 113 replies
  • August 15, 2024

Hi @Harsha!

Thank you for suggesting this product idea. We will follow the traction this idea gets in the community. You can expect updates on this post if we decide to include this in our future roadmap, or if we require more information.


Forum|alt.badge.img

Hi @Harsha and thanks for this idea.

When you say auto-reconciliation, are you thinking of a two-way sync or simply that _all_ changes in the source system table(s) be reflected in the extracted CDF data?


  • Author
  • Committed
  • 9 replies
  • August 15, 2024

Thomas Sjølshagen we are looking for source system changes especially the physical deletes to be deleted/ marked for deletion in the cdc raw.


Forum|alt.badge.img

Understood.

 

As an “in the middle” entity, the extractor has a hard time deciding whether or not data has been removed or is simply “missing” (i.e. we can see in CDF that we copied the data there before the current run, but the reason why we can’t see it in the source system right now is not something we can - with 100% certainty - state is because it was deleted by someone).

As a result, a general purpose “always delete data not found in the source system” is a potential problem vector for us, and one we’re unlikely to “automate the execution of”.  By automate, I mean we probably should’t indiscriminately delte data from the CDF destination location, for the extractor, that we’re not seeing on the source. We will need to ask someone how they (you) want us to behave, so the behavior is intentional and “it’s consequences are understood” when the functionality - potentially - becomes available/is enabled for a given run. 

I have added this idea to our longer term backlog of potential feature as something we want to solve better in general across extractors. It is not on the near-term roadmap right now since we’ve got our hands full working to improve monitoring and metrics availability for the integration pipeline in Q1 and Q2, among other things. 

I will update the roadmap and include this idea in 2025, as we progress into Q1, 2025 planning activities. I am actively seeking solutions to let us bring a feature like this “in” from a timeline perspective, but right now it’s “only” in the backlog unfortunately.


Reply


Cookie Policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

 
Cookie Settings