Streamlit app error File out of specification: Repetition level must be defined for a primitive type
A Streamlit app that happily runs locally fails when deployed to CDF.
The line `st.dataframe([{‘key’: ‘value’, ...}])` produces this error. Not every time, but once the error happens, it does not recover and keeps showing this error even though the app reruns to show updated data.
Error: External format error: File out of specification: Repetition level must be defined for a primitive type
Do you have a workaround?
It is a simple data set really just string key-value pairs. When it works, it looks like this:
Contact me and I will share the URL of the deployed streamlit app.
Page 1 / 1
Thank you for reaching out, @Jan Dolejsi.
Are you able to share a minimal example of this behaviour? Note that apps can be exported from the app overview:
Lars, I will build the repro. Was hoping that other people have seen this kind of an error (or the error message could be simply looked up in the source code of cognite-stlite). No free lunch, I guess.
No time to build the repro so far. But noticed that if I switch the st.dataframe to st.editor (i.e. with all columns disabled), the issue happens far less frequently. It still does though.
@Jan Dolejsi , I hit this exact error. Seems you cannot have multiple types (in a single column) in your dataframe.
My search lead to this AI generated explanation:
The error "External format error: File out of specification: Repetition level must be defined for a primitive type pandas dataframe" in Streamlit arises when there's a mismatch between how pandas DataFrames are structured and how Streamlit's internal data handling (using Apache Arrow) expects them. This usually happens when Streamlit attempts to serialize a DataFrame into an Arrow table for efficient data transfer and rendering.
Here is a breakdown of the issue and potential solutions:
Cause:
Data Type Issues:
The core problem often lies in the data types within your DataFrame. Arrow, which Streamlit uses for data serialization, has specific requirements for how data is represented. Inconsistencies or ambiguous types can trigger this error. This is especially true for columns with mixed data types (e.g. a column that contains both integers and strings) or object types.
Nested Data:
DataFrames with nested structures, like lists or dictionaries within cells, are not directly compatible with Arrow's flat structure.
Missing Data:
Inconsistent handling of missing data (NaN, None) can also cause problems during serialization.
Possible Solutions:
Data Type Conversion:
Explicitly convert columns to appropriate types: Use pandas' astype() to convert columns to specific types (e.g., int, float, str, datetime). Ensure that each column has a consistent data type.
Python
df>'column_name'] = dfo'column_name'].astype(str)
Handle mixed types: If a column contains mixed types, you might need to convert everything to a string type or handle the different types separately.
Handling Nested Data:
Flatten nested structures: If your DataFrame contains nested lists or dictionaries, you might need to flatten them into separate columns or use string representations.
Consider alternative data formats: If nested data is essential, explore other ways to represent it in Streamlit (e.g., using st.json).
Missing Data Handling:
Fill missing values: Use fillna() to fill missing values with a default value or a specific strategy.
Python
df = df.fillna(0) # Fill with 0 df = df.fillna('N/A') # Fill with 'N/A'
Remove rows with missing values: If appropriate, use dropna() to remove rows containing missing data.
Debugging:
Isolate the problematic DataFrame: If you have multiple DataFrames, try to isolate the one causing the error by commenting out other st.dataframe calls.
Inspect data types: Use df.dtypes to check the data types of each column.
Check for mixed types: Look for columns with dtype('object') that might contain a mix of data types.
By addressing these potential issues, you should be able to resolve the "External format error" and display your pandas DataFrames correctly in Streamlit.
Thanks, @Jason Dressel. That does make sense. I managed to avoid this by in most places by using a read-only `st.data_editor(ListiDict] | Dataframe)` instead of `st.dataframe(ListiDict] | Dataframe)`. The `st.data_editor()` helped me implement selection and therefore deletion/editing for rows. And to encapsulate that, I have a function that takes `ListiT]` and a callable that projects `T` to `Dictistr, any]` and after I insert a column with a checkbox, I render `st.data_editor(ListiDictistr, any]], disabled=all_columns_except_selection)`. While doing so, I avoid passing the Dataframe to st.dataframe or st.data_editor. And from your explanation, it seems that that does not go through the Arrow framework and is not bothered by different types in one column .. as you can see on this screenshot in the _value_ column.