Streamlit & Gradio: a demo UI

A JSON API (Chapter 4) is perfect for machines, but useless for the product manager, the domain expert, or the executive who wants to try your model. For them you need a UI — and you do not want to write HTML, CSS, and JavaScript for a demo. Streamlit and Gradio turn a Python script into an interactive web app in minutes. They're how ML people ship UIs without becoming frontend engineers.

Install: pip install streamlit (or pip install gradio). Follow-along — both launch a local web server.

Streamlit: a script is the app

Streamlit's model is delightfully simple: write a normal top-to-bottom Python script, and each st.* call renders a widget. It re-runs the whole script on every interaction. Here's streamlit_app.py, a full UI for our model:

import streamlit as st
from sentiment.data import load_dataset
from sentiment.model import SentimentModel

@st.cache_resource                       # load the model ONCE, not on every rerun
def get_model():
    return SentimentModel().fit(*load_dataset())

st.title("🎭 Sentiment Analyzer")
text = st.text_area("Your text", "this product is absolutely wonderful")
if st.button("Analyze") and text.strip():
    score = float(get_model().predict_proba([text])[0])
    st.metric("P(positive)", f"{score:.1%}")
    st.progress(score)
    st.success("Positive 😊") if score >= 0.5 else st.error("Negative 😞")

Run it:

cd code
streamlit run streamlit_app.py            # opens http://localhost:8501

You get a real web app: a title, a text box, a button, a live percentage metric, a progress bar, and a colored verdict — from ~12 lines of Python, no HTML.

Don't be confused: @st.cache_resource is not optional here. Streamlit re-runs your entire script on every click. Without the cache decorator, you'd retrain the model on every interaction — slow and wasteful. @st.cache_resource (for models/ connections) and @st.cache_data (for dataframes/computations) memoize expensive work across reruns. Forgetting them is the #1 Streamlit performance bug.

The production pattern: UI calls the API

In the demo above the UI loads the model directly — fine for a prototype. In production you keep one model behind the FastAPI service and have the UI call it, so there's a single source of truth and the model isn't duplicated in every app. Our script supports both — set API_URL and it calls the service instead:

def score(text):
    if API_URL:                           # production: call the FastAPI service
        return requests.post(f"{API_URL}/predict", json={"text": text}).json()["score"]
    return float(get_model().predict_proba([text])[0])   # prototype: in-process
API_URL=http://localhost:8000 streamlit run streamlit_app.py

This is the right architecture: the API owns inference; the UI is just a client.

Gradio: even faster for ML demos

Gradio is the other popular choice, purpose-built for ML demos and tightly integrated with Hugging Face. You wrap a function in an Interface and it builds the UI:

import gradio as gr
from sentiment.model import SentimentModel
from sentiment.data import load_dataset

model = SentimentModel().fit(*load_dataset())

def classify(text):
    p = float(model.predict_proba([text])[0])
    return {"positive": p, "negative": 1 - p}     # Gradio renders a label/bar chart

gr.Interface(fn=classify, inputs="text", outputs="label").launch()

That's a complete app with a labeled confidence chart. Gradio also gives you a public shareable link (launch(share=True)) — great for sending a demo to someone — and one-click hosting on Hugging Face Spaces.

Don't be confused: Streamlit vs. Gradio — which to pick? Gradio is fastest for a single-model demo ("input → model → output"), with instant sharing and HF Spaces hosting. Streamlit is better for richer apps — dashboards, multiple inputs, charts, multi-step tools — because you control the full page layout. Demo a model → Gradio. Build an internal tool → Streamlit.

Where these fit (and where they don't)

  • Great for: internal tools, model demos, stakeholder reviews, data dashboards, quick experiments, hackathons.
  • Not for: customer-facing production apps at scale. They're single-process and not built for heavy concurrent traffic. For a real product UI, a frontend framework (React/Vue) talks to your FastAPI backend — but that's a frontend job, not yours.

The value is speed-to-demo: you can put a working model in front of a human in minutes, which is often what unblocks a project ("can I just try it?").

The takeaway

Streamlit and Gradio turn a Python script into a web UI with no frontend code — Gradio for quick single-model demos with instant sharing, Streamlit for richer multi-widget apps. Cache the model (@st.cache_resource), and in production have the UI call your FastAPI service rather than loading the model itself. These are for demos and internal tools, not customer-facing scale. We can now train, track, serve, scale, version, orchestrate, optimize, and demo the model — the last question is whether it's still working in production. 👉