Skip to main content
New

Configuring Documentum file extractor meta-data

  • October 29, 2024
  • 6 replies
  • 55 views

Hi All, 

Instead of relying on the “with metadata” query which only takes in boolean ‘true vs false’ value, is there a way for me to be more specific with which meta-data to bring in?  i.e. the ability to select the meta-data to ingest into CDF instead of all meta-data when ‘with metadata = true”.

6 replies

Mithila Jayalath
Seasoned Practitioner
Forum|alt.badge.img+8

@lhcy at the moment there is no way to filter/select metadata fields. If you would like to have this as a feature, I can convert this question into a product idea.


  • Author
  • Active
  • November 5, 2024

Hi @Mithila Jayalath , that would be appreciated! Flexbility to remove trivial meta-data will make Cognite Search easier to use. 


Forum|alt.badge.img

Hi @lhcy,

Am I right in describing the problem you’re seeing is the existence of too many metadata keys (fields) in the data from your Documentum source making it difficult to search for documentum data in CDF?

Should the requested capability exist at some point, are there any concerns of perhaps being too aggressive when selecting the metadata fields to exclude and what that may mean for the user experience? (I do realize that may be remedied by re-ingesting with a less stringent exclusion list/filter). 

Is the problem solely that the “full list of metadata” from the source system represents a confusing amount of data, or is there specific data you think it makes sense to exclude? Maybe by default?


  • Author
  • Active
  • November 6, 2024

Hi @Thomas Sjølshagen, the ideal feature is giving us ‘flexibility’ to select metadata during ingestion process - so that we can add / reduce metadata based on user feedback (this could be an iterative process initially). 

Another issue we face with Documentum metadata (other than having too many making it hard to Search), is Documentum Extractor runs failing due to certain files exceeding allowable metadata limit, e.g. for us culprits are the <tag_num> and <links> metadata. 


Markus Pettersen
MVP

Being able to select only relevant metadata might solve, or at least improve, some of our memory issues. When retrieving a large amount of assets, where we only need 2 metadata fields but have to retrieve all 30+, this adds up to quite a bit when we are talking about 500k+ assets (and no, we can’t filter to limit the number, we need the full context).


Markus Pettersen
MVP

Basically, sending in the request with a list of keys specifying the relevant metadata fields.