A few quick questions to help with a potential client engagement focusing on generating insights from unstructured docs in file systems:
- Can we go beyond the 1MB limit to allow OCR, index and search of any free text within the documents. Is this configurable?
- LLM/NLP Search: What level of fuzzy or NLP searches are supported on the indexed content of a document. For example could one search for a phrase that isn’t verbatim in the file, but close enough in meaning or spread out within a couple of places within the document.
- Automation: I understand we can auto-extract file/folder names and even text from the initial portion of a document to create metadata. Can this metadata be used in an automated entity matching service to automatically relate (contextualize) a new document to the parent field, well, facility etc.
- Can the same automation work by purely relying on the text within the OCRed document to avoid the human having to categorize into folders and ensuring naming standards for files in first place. What are the capabilities/limits we should be aware of.
- Can the same automation work by purely relying on the text within the OCRed document to avoid the human having to categorize into folders and ensuring naming standards for files in first place. What are the capabilities/limits we should be aware of.
Best answer by Tommy Thorsen
View original