I follow the Docments for binding files in CDF(client.files.upload_content)
After uploading, AtlasAI is indeed able to use the content.
However, this file seems to be a chunked TXT document generated from the customer document. I believe this TXT file should follow a certain structure, but I could not find any documentation describing the expected text content structure in the help docs.
Is this an industry-standard format, or is the file_content structure arbitrary?
It seems that the completeness and quality of the chunk file directly determine the upper limit of AtlasAI’s intelligence and response quality.
We won’t lock down the format used in this endpoint, nor the chunking algorithm. We need the liberty to tweak both to continuously communicate an LLM friendly format.
Every now and then we do some benchmarks for different formats, and pick the one that scores well on:
- token usage of llms to keep cost fair - reasoning on tabular structures like tables and forms
However, our dataset is limited and may not represent all possible cases fairly. I’ve reached out to relevant people working on Atlas to assist further.
We won’t lock down the format used in this endpoint, nor the chunking algorithm. We need the liberty to tweak both to continuously communicate an LLM friendly format.
Every now and then we do some benchmarks for different formats, and pick the one that scores well on:
- token usage of llms to keep cost fair - reasoning on tabular structures like tables and forms
However, our dataset is limited and may not represent all possible cases fairly. I’ve reached out to relevant people working on Atlas to assist further.