Solved

AtlasAI Chunk File Structure and Impact on Retrieval Quality

Forum|Forum|1 month ago
May 6, 2026
1 reply
24 views

B

+5

Bruce Gee
Committed ⭐️⭐️

I follow the Docments for binding files in CDF(client.files.upload_content)

After uploading, AtlasAI is indeed able to use the content.

However, this file seems to be a chunked TXT document generated from the customer document. I believe this TXT file should follow a certain structure, but I could not find any documentation describing the expected text content structure in the help docs.

Is this an industry-standard format, or is the file_content structure arbitrary?

It seems that the completeness and quality of the chunk file directly determine the upper limit of AtlasAI’s intelligence and response quality.

Best answer by andersfylling

Hi, AtlasAI is most likely utilizing the Documents passages API for semantic and/or keyword search. See here: https://api-docs.cognite.com/20230101/tag/Documents/operation/documentsPassagesSearch

We won’t lock down the format used in this endpoint, nor the chunking algorithm. We need the liberty to tweak both to continuously communicate an LLM friendly format.

Every now and then we do some benchmarks for different formats, and pick the one that scores well on:

- token usage of llms to keep cost fair
- reasoning on tabular structures like tables and forms

However, our dataset is limited and may not represent all possible cases fairly. I’ve reached out to relevant people working on Atlas to assist further.

A

andersfylling
Practitioner ⭐️⭐️⭐️
Answer
Forum|Forum|1 month ago
May 6, 2026

Hi, AtlasAI is most likely utilizing the Documents passages API for semantic and/or keyword search. See here: https://api-docs.cognite.com/20230101/tag/Documents/operation/documentsPassagesSearch

We won’t lock down the format used in this endpoint, nor the chunking algorithm. We need the liberty to tweak both to continuously communicate an LLM friendly format.

Every now and then we do some benchmarks for different formats, and pick the one that scores well on:

- token usage of llms to keep cost fair
- reasoning on tabular structures like tables and forms

However, our dataset is limited and may not represent all possible cases fairly. I’ve reached out to relevant people working on Atlas to assist further.

Like

Sign up

Welcome to Cognite Hub

Scanning file for viruses.

This file cannot be downloaded