DB Extractor and deleted rows

Question

We use DB Exctractor to push data from MSSQL Database to RAW, but have run into a problem.

Over time the raw table grows much larger than the Database table and this is due to rows that are deleted in database will not be deleted in RAW.

How can I set up the config yaml for db extractor to also handle rows that are deleted in database?

Thomas Sjolshagen · Answer

Hi ​@Pedersen Jon-Robert,We have intentionally avoided having the extractor delete data automatically since the extractor cannot reliably determine _why_ a given piece of data is unavailable to us, while we’re reading the source. From my perspective, this type of sync will alwaysrequire some form of human intervention/decision.You could, possiblyas an occasional workaround - not something I would suggest you do as a scripted, regular, thing -use the CDF Toolkit to recreate the RAW table and force a reload fromsource data. Then use a transformation to clean up the missing/removed data.We’re considering the option of, for instance, using a special field/column as a “data is missing” field to indicate that the extractor was unable to findspecific row(s)/column(s)/cell(s) of data whenever the target in CDF has data that we cannot locate in the source. That filed could then be used in a delete transformation to locate data to remove from the running CDF instance.BUT, this is only at the “brain worm” state as of now.

Sign up

Welcome to Cognite Hub

Scanning file for viruses.

This file cannot be downloaded