From Vercel to Fly.io

28 Jun, 2025

So, here's the thing about building ambitious projects: sometimes your tools outgrow their original home. That's precisely what happened with Ask The Game this week.

The Problem: When Static Meets Heavy Metal

I started this podcast analysis pipeline with big dreams and heavy dependencies. We're talking PyTorch, SpeechBrain, pyannote.audio ... basically the heavy metal band of machine learning libraries. And where did I try to deploy this beast? Vercel.

Vercel is absolutely fantastic for what it's designed for: blazing fast static sites and serverless functions. But asking it to handle a full ML pipeline with gigabytes of model weights? That's like asking a sports car to tow a trailer. Technically possible, but you're gonna have a bad time.

The build kept failing with out-of-memory (OOM) errors. Every time I tried to deploy, Vercel would take one look at my requirements.txt and basically say, "Nope, not today."

Enter Fly.io

While Vercel excels at the edge, Fly.io gives me actual compute power where I need it. Real VMs, persistent storage, and the kind of resources that ML workloads actually require.

The migration turned out to be smoother than I expected. Here's what the stack looks like now:

Docker Configuration

First up, a Dockerfile that actually understands what we're trying to do:

FROM python:3.12-slim

# Install system dependencies for audio processing
RUN apt-get update && apt-get install -y \
    build-essential \
    ffmpeg \
    libsndfile1 \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY requirements-full.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

Notice that requirements-full.txt? I split my dependencies into two files. The original requirements.txt stays minimal for any potential static deployments, while requirements-full.txt includes all the ML heavy hitters.

Fly.io Configuration

The fly.toml config is where things get interesting:

app = "askthegame"
primary_region = "ord"  # Chicago for low latency

# Machine configuration optimized for ML workloads
[vm]
  cpu_kind = "performance"
  cpus = 2
  memory = "4gb"  # Enough breathing room for PyTorch

# Process configuration for background workers
[processes]
  worker = "python scripts/run_pipeline.py"
  web = "python health_server.py"

The key insight here is to treat this as what it actually is: a background processing system, not a web application. The web process is simply a health check server, while the real work occurs in the worker process.

The Results: Night and Day

The difference is honestly pretty dramatic. What used to be impossible on Vercel now runs smoothly on Fly.io. I can SSH into the machine, run the pipeline on specific episodes, and actually see real-time logs of the ML processing happening.

Now that the pipeline has a proper home, I can focus on what actually matters: processing those 900+ podcast episodes and building the analysis features that'll make Ask The Game genuinely helpful.

The infrastructure is solid and the deployment is automated. Time to get back to the fun stuff: making sense of all that podcast data.