Skip to main content
Question

DB Extractor and deleted rows

  • May 14, 2025
  • 2 replies
  • 54 views

Forum|alt.badge.img

We use DB Exctractor to push data from MSSQL Database to RAW, but have run into a problem.

Over time the raw table grows much larger than the Database table and this is due to rows that are deleted in database will not be deleted in RAW.

 How can I set up the config yaml for db extractor to also handle rows that are deleted in database?

 

 

2 replies

Mithila Jayalath
Seasoned Practitioner
Forum|alt.badge.img+8

@Thomas Sjølshagen will you be able to help out here?


Forum|alt.badge.img

Hi ​@Pedersen Jon-Robert,

We have intentionally avoided having the extractor delete data automatically since the extractor cannot reliably determine _why_ a given piece of data is unavailable to us, while we’re reading the source. From my perspective, this type of sync will always require some form of human intervention/decision.

You could, possibly as an occasional workaround - not something I would suggest you do as a scripted, regular, thing - use the CDF Toolkit to recreate the RAW table and force a reload from source data. Then use a transformation to clean up the missing/removed data.

We’re considering the option of, for instance, using a special field/column as a “data is missing” field to indicate that the extractor was unable to find specific row(s)/column(s)/cell(s) of data whenever the target in CDF has data that we cannot locate in the source. That filed could then be used in a delete transformation to locate data to remove from the running CDF instance.

BUT, this is only at the “brain worm” state as of now.