Celery + Redis: an async task queue

Some work is too slow to do inside a web request. If a user asks to score 10,000 documents, or you need to retrain the model, you can't make them wait 30 seconds for the HTTP response — the request will time out and your server will be tied up. The fix is a task queue: hand the slow job to a background worker and return immediately. Celery is Python's standard task queue, and Redis is the fast in-memory store that connects the pieces.

Setup: pip install celery redis and run a Redis server (redis-server, or docker run -p 6379:6379 redis). Follow-along — needs Redis + a worker process.

The architecture

Three players pass a job along:

  your API  ──.delay(job)──►  Redis (the queue)  ──►  Celery worker runs it
     │                                                       │
     └────────  returns a task ID instantly  ◄──── result stored back in Redis

Producer — your code, which enqueues a task with .delay(...) and gets back a ticket (an AsyncResult) instantly, without waiting.
Broker (Redis) — the queue tasks wait in until a worker is free.
Worker — a separate process that pulls tasks off the queue and runs them.
Result backend (Redis) — where the worker stores the result for later pickup.

Don't be confused: Redis is playing two roles here. As the broker it's the message queue (tasks waiting to run). As the result backend it stores finished results. They're configured separately (and can be different systems — e.g. RabbitMQ broker + Redis backend), but using Redis for both is the simplest common setup.

The Celery app

tasks/celery_app.py configures Celery to use Redis for both roles, with production-sane defaults:

from celery import Celery

app = Celery("sentiment", broker="redis://localhost:6379/0",
                          backend="redis://localhost:6379/0")
app.conf.update(
    task_time_limit=300,            # hard-kill a task after 5 minutes
    worker_max_tasks_per_child=100, # recycle workers to avoid memory leaks
)

The tasks

A task is just a function with the @app.task decorator. From tasks/tasks.py:

@app.task
def batch_score(texts: list[str]) -> list[dict]:
    model = _model()
    probs = model.predict_proba(texts)
    return [{"text": t, "score": round(float(p), 4),
             "label": "positive" if p >= 0.5 else "negative"}
            for t, p in zip(texts, probs)]

@app.task
def retrain() -> dict:
    """Retrain from scratch and save. Schedule this nightly."""
    model = SentimentModel().fit(*load_dataset())
    model.save(MODEL_PATH)
    return {"status": "retrained", "vocab_size": len(model.vocab)}

Running it (three terminals)

# terminal 1 — the broker
redis-server

# terminal 2 — a worker (from code/)
celery -A tasks.celery_app worker --loglevel=info

# terminal 3 — enqueue a job and fetch the result
python -c "from tasks.tasks import batch_score; \
           print(batch_score.delay(['great product','awful service']).get(timeout=10))"

The worker logs the task running, and terminal 3 prints the result once it's done — the task logic produces exactly this (verified against the model):

[{'text': 'great product', 'score': 0.9768, 'label': 'positive'},
 {'text': 'awful service', 'score': 0.3242, 'label': 'negative'}]

The key move: .delay(...) returned instantly with a ticket; .get() waited for the worker to finish. In a real API you'd return {"task_id": result.id} immediately and let the client poll a /status/{id} endpoint — the user never waits on the connection.

The fire-and-forget pattern in an API

@app.post("/score-batch")
def score_batch(texts: list[str]):
    task = batch_score.delay(texts)       # enqueue, don't wait
    return {"task_id": task.id}           # respond in milliseconds

@app.get("/result/{task_id}")
def get_result(task_id: str):
    res = batch_score.AsyncResult(task_id)
    return {"ready": res.ready(),
            "result": res.result if res.ready() else None}

This is how every "we're processing your request, check back" flow works — bulk exports, video processing, sending email, long ML inference.

Scheduled tasks (Celery Beat)

Celery also runs tasks on a schedule via Celery Beat — e.g. retrain every night:

app.conf.beat_schedule = {
    "nightly-retrain": {"task": "tasks.tasks.retrain", "schedule": 86400.0},
}

(For richer pipelines with dependencies and retries, you'll reach for a real orchestrator — that's Chapter 9, Prefect. Celery is for tasks; Prefect is for workflows.)

Why a queue, not just threads

Don't be confused: background threads vs. a task queue. You could run slow work in a thread, but a task queue gives you what threads can't: work survives an API restart (it's in Redis, not memory), it scales across many machines (add more workers), it retries failures automatically, and it doesn't compete with your web server for resources. For anything important or heavy, use the queue.

When Celery — and when not

Celery is the right tool surprisingly often, but it's not the only one. Match the tool to the job:

You need…	Reach for
A quick "fire and forget" after responding (send an email, log an event)	FastAPI `BackgroundTasks` — built in, no broker, runs in the same process
Durable jobs that survive restarts, retry, and scale across machines	Celery + Redis (this chapter) — the general-purpose workhorse
A simpler Redis-only queue	RQ, Dramatiq, Arq (async)
Distributed ML training or heavy parallel inference across a cluster	Ray — purpose-built for scaling Python/ML compute
Multi-step pipelines with dependencies between steps	a workflow orchestrator (Prefect, Chapter 9)

Don't be confused: BackgroundTasks vs. Celery vs. Ray. FastAPI's BackgroundTasks runs work in your web process after the response is sent — great for light, fire-and-forget jobs, but the work dies if the process restarts and can't scale past one machine. Celery runs work in separate worker processes backed by a broker — durable, retryable, horizontally scalable; the right default for real background jobs. Ray is a different animal: a framework for distributing heavy compute (training, large batch inference, hyperparameter sweeps) across a cluster. Reach for Ray when the bottleneck is CPU/GPU work that needs many machines, not when you simply need to get a job off the request path.

Where it fits in production

Decouple slow work from the request path → fast, responsive APIs.
Scale independently → add workers when the queue backs up, without touching the API.
Reliability → tasks retry, and survive crashes because they live in Redis.

Alternatives in this category: RQ (simpler, Redis-only), Dramatiq, Arq (async). Cloud-native options: AWS SQS + Lambda. The pattern — producer, broker, worker — is identical everywhere.

The takeaway

A task queue moves slow work (batch scoring, retraining, emails) out of the request path: your code calls .delay() and returns instantly, Redis holds the queue, and Celery workers do the work on their own machines — surviving restarts and scaling independently. Use Beat for scheduled jobs, a real orchestrator for complex pipelines. We've been leaning on Redis as the broker; next, let's use it for the other thing it's great at — caching. 👉

Production ML & AI Tools: A Hands-On Field Guide