Solved

Importing datasets to Streamlit apps for machine learning applications

  • 11 December 2023
  • 6 replies
  • 115 views

Badge

I am developing a outlier detection model. The model will be accessed by users through a Streamlit app. At the moment I am assessing whether deploying the model through the Streamlit page in CDF would be a good option. My issue is that I am not sure what the most efficient way to access and import the dataset into the app is. Would it be a good approach to create a data model and import the data from there, or extract the data straight from raw? Another idea I had was setting up a scheduled Cognite Function to download the data as a csv and letting the app access that. 

I need the data to be updated a few times a week and have the “import” time in the app to be fairly quick for the user experience in the app to be good.

My dataset is around 1 million rows with three numeric and two non-numeric columns. For now, I have the data as a raw table in CDF.

Thanks!

icon

Best answer by Dilini Fernando 2 February 2024, 10:22

View original

6 replies

Userlevel 2

I do not know the full details of the problem, but to me it sounds like a good approach would be to use Data models to store the data and import data from these models into your Streamlit app. Extracting data directly from RAW is also an option, but we recommend using Data models for other purposes than data onboarding.

Badge

I have been considering this approach and done some testing, however I am struggling a bit loading the data from my data model in a jupyter notebook. Is there a simple way to query it? I am finding the documentation around quering data models a bit confusing and complicated. What I want to do is basically this:

dataframe = ## import all rows of table X in data model Y where column Z = False as a pandas dataframe ##

#  then do some analysis on the pandas dataframe

 

Userlevel 3

Have you tried to use the https://pypi.org/project/cognite-pygen/ package for your data model(s)? See the Exploration CDF Notebook page for installation instructions.

 

Alternatively, query instances w/the standard Python CDF SDK: https://cognite-sdk-python.readthedocs-hosted.com/en/latest/data_modeling.html#cognite.client._api.data_modeling.instances.InstancesAPI.query / 

https://cognite-sdk-python.readthedocs-hosted.com/en/latest/data_modeling.html#cognite.client._api.data_modeling.instances.InstancesAPI.search

 

Can also use the execute GraphQL query capabilities in the standard Cognite Python SDK (available in the more recent versions of the SDK): https://cognite-sdk-python.readthedocs-hosted.com/en/latest/data_modeling.html#execute-graphql-query

 

Userlevel 4
Badge +2

Hi @Sebastian Heibø,

Did the above help you?

Br,
Dilini 

Userlevel 4
Badge +2

Hi @Sebastian Heibø,

I hope Thomas was able to assist you with your query. As of now, I'm going to close this topic. But if you have any further questions, please feel free to start a new post.

Best regards,

Dilini

 

Badge

Hi

Yes! I tried using pygen and got it to work. Excited to use this SDK more when it gets out of the experimental phase :)

Sebastian

Reply