Streamlit & Gradio: a demo UI
A JSON API (Chapter 4) is perfect for machines, but useless for the product manager, the domain expert, or the executive who wants to try your model. For them you need a UI — and you do not want to write HTML, CSS, and JavaScript for a demo. Streamlit and Gradio turn a Python script into an interactive web app in minutes. They're how ML people ship UIs without becoming frontend engineers.
Install:
pip install streamlit(orpip install gradio). Follow-along — both launch a local web server.
Streamlit: a script is the app
Streamlit's model is delightfully simple: write a normal top-to-bottom Python script,
and each st.* call renders a widget. It re-runs the whole script on every interaction.
Here's streamlit_app.py, a full UI for our model:
import streamlit as st
from sentiment.data import load_dataset
from sentiment.model import SentimentModel
@st.cache_resource # load the model ONCE, not on every rerun
def get_model():
return SentimentModel().fit(*load_dataset())
st.title("🎭 Sentiment Analyzer")
text = st.text_area("Your text", "this product is absolutely wonderful")
if st.button("Analyze") and text.strip():
score = float(get_model().predict_proba([text])[0])
st.metric("P(positive)", f"{score:.1%}")
st.progress(score)
st.success("Positive 😊") if score >= 0.5 else st.error("Negative 😞")
Run it:
cd code
streamlit run streamlit_app.py # opens http://localhost:8501
You get a real web app: a title, a text box, a button, a live percentage metric, a progress bar, and a colored verdict — from ~12 lines of Python, no HTML.
Don't be confused:
@st.cache_resourceis not optional here. Streamlit re-runs your entire script on every click. Without the cache decorator, you'd retrain the model on every interaction — slow and wasteful.@st.cache_resource(for models/ connections) and@st.cache_data(for dataframes/computations) memoize expensive work across reruns. Forgetting them is the #1 Streamlit performance bug.
The production pattern: UI calls the API
In the demo above the UI loads the model directly — fine for a prototype. In production
you keep one model behind the FastAPI service and have the UI call it, so there's
a single source of truth and the model isn't duplicated in every app. Our script
supports both — set API_URL and it calls the service instead:
def score(text):
if API_URL: # production: call the FastAPI service
return requests.post(f"{API_URL}/predict", json={"text": text}).json()["score"]
return float(get_model().predict_proba([text])[0]) # prototype: in-process
API_URL=http://localhost:8000 streamlit run streamlit_app.py
This is the right architecture: the API owns inference; the UI is just a client.
Gradio: even faster for ML demos
Gradio is the other popular choice, purpose-built for ML demos and tightly integrated
with Hugging Face. You wrap a function in an Interface and it builds the UI:
import gradio as gr
from sentiment.model import SentimentModel
from sentiment.data import load_dataset
model = SentimentModel().fit(*load_dataset())
def classify(text):
p = float(model.predict_proba([text])[0])
return {"positive": p, "negative": 1 - p} # Gradio renders a label/bar chart
gr.Interface(fn=classify, inputs="text", outputs="label").launch()
That's a complete app with a labeled confidence chart. Gradio also gives you a public
shareable link (launch(share=True)) — great for sending a demo to someone — and
one-click hosting on Hugging Face Spaces.
Don't be confused: Streamlit vs. Gradio — which to pick? Gradio is fastest for a single-model demo ("input → model → output"), with instant sharing and HF Spaces hosting. Streamlit is better for richer apps — dashboards, multiple inputs, charts, multi-step tools — because you control the full page layout. Demo a model → Gradio. Build an internal tool → Streamlit.
Where these fit (and where they don't)
- Great for: internal tools, model demos, stakeholder reviews, data dashboards, quick experiments, hackathons.
- Not for: customer-facing production apps at scale. They're single-process and not built for heavy concurrent traffic. For a real product UI, a frontend framework (React/Vue) talks to your FastAPI backend — but that's a frontend job, not yours.
The value is speed-to-demo: you can put a working model in front of a human in minutes, which is often what unblocks a project ("can I just try it?").
The takeaway
Streamlit and Gradio turn a Python script into a web UI with no frontend code — Gradio
for quick single-model demos with instant sharing, Streamlit for richer multi-widget
apps. Cache the model (@st.cache_resource), and in production have the UI call your
FastAPI service rather than loading the model itself. These are for demos and internal
tools, not customer-facing scale. We can now train, track, serve, scale, version,
orchestrate, optimize, and demo the model — the last question is whether it's still
working in production. 👉