"Many-to-Many" relationship between "Extraction Pipelines" and datasets

Related products: Extractors

Hi,

I am not sure if this is already on your roadmap, however I wanted to share an idea that could potentially improve the current data management processes in CDF. Our team utilizes for some data integrations a single extractor to fetch data that multiple datasets uses. This is done to minimize the load on our source enterprise databases, and reduce the maintenance of having a single larger extraction rather than many smaller extractions. This approach, presents a challenge in monitoring and lineage visibility, as each extraction can only be linked to only one dataset.

The core of the proposal is to consider a "many-to-many" relationship between Extraction Pipelines and Datasets. This would allow us to maintain our integrations easier while significantly improving lineage visibility for all related datasets. Implementing such a feature could provide a clearer overview of data dependencies, that I believe other firms than AkerBP would also find useful.

Page 1 / 1

@Jørgen Lund Maybe something for you to look into with your monitoring improvements that is ongoing.

Thanks @MatiasRamsland and @Øystein Aspøy.

Just for clarifiation: is the extractor in this example writing to multiple datasets directly, or to a single dataset that is subsequently processed/moved into a number of other datasets?

In general I think this sounds like a good idea, and we’ll consider it as part of the improvements we are planning for monitoring.

For clarification @Jørgen Lund: The extractor is writing for multiple datasets. Hence multiple datasets rely on one extraction pipeline.

Sign up

Log in to the community

Scanning file for viruses.

This file cannot be downloaded