How I Labelled 5,220 Podcast Segments in One Day
I just finished what might be the most satisfying 2.5 hours I've spent vibe-coding this past months. I processed 15 episodes of The Game Podcast through my pipeline with ath the end, an AI system that automatically labeled every single segment with meaningful business topics. Not keywords, not timestamps, but actual semantic labels like "Pricing Strategy Discussion" and "Leadership Philosophy."
The result? 5,220 podcast segments are now searchable by meaning instead of time. And honestly, I'm a little blown away by how well this worked.
The Problem That's Been Bugging Me
Here's what drives me crazy about podcast discovery. You want to find "that part where Alex talked about pricing psychology." So you either:
- Scroll through episode descriptions hoping someone mentioned "pricing"
- Scrub through audio files hoping to recognize the content
- Give up and just remember it was "somewhere in the 900s"
I've been building Ask The Game specifically to solve this problem for myself. I listen to Alex Hormozi religiously, but finding specific insights across 915+ episodes? Basically impossible.
Search engines work great for text because they can understand context. But podcast search is stuck in the stone age of keyword matching and timestamp hunting.
What I Built and Tested
I designed an AI system that reads podcast segments and assigns them human-readable topic labels. Not just any labels, business-specific categories that actually match how Alex organizes his thinking.
The system works like this:
- Take a chunk of podcast transcript (usually 30-90 seconds)
- Feed it to GPT-3.5-turbo with context from surrounding chunks
- Ask the AI: "What business topic is this really about?"
- Get back structured labels like "Financial Strategy" or "Operations & Systems"
- Save those labels to the database for instant searching
I tested this on 15 episodes ranging from 39 segments (short episodes) to 556 segments (Alex's longest philosophical deep-dives). Every single segment got labeled successfully.
The Technical Details (Without the Jargon)
The magic happens in the prompt engineering. I don't just throw transcript text at the AI and hope for the best. The system:
Gives the AI business context: "You're an expert business content analyst" Provides specific categories: 9 predefined business topic areas that match Alex's content Shows examples: "Pricing Strategy Discussion", "Hiring Philosophy" Reads surrounding context: 5 chunks around the target segment so the AI understands the flow
Here's what surprised me most is that the AI adapts intelligently to different episode types. Philosophical episodes get labeled primarily as "Business Mindset." Tactical episodes get distributed across "Marketing Strategy," "Sales Strategy," "Operations." Multi-part series maintain thematic consistency.
The system isn't just pattern matching keywords. It's actually understanding business context.
What This Unlocks Right Now
Instead of searching "pricing" and getting 47 random timestamps, I can now query for "Financial Strategy" and get every segment where Alex discusses pricing psychology, revenue models, or investment decisions.
Want to find all the leadership advice? Query "Leadership & Management." Looking for specific sales tactics? "Sales Strategy" gives you everything from closing techniques to customer relationship insights.
Another really cool part is that the categories reflect how Alex actually thinks about business. The AI identified these topic distributions across 5,220 segments:
- Business Mindset: 37.3% (philosophy and mental frameworks)
- Marketing Strategy: 12.4% (brand building and customer acquisition)
- Financial Strategy: 10.0% (pricing and revenue planning)
- Operations & Systems: 8.7% (processes and efficiency)
That distribution makes perfect sense if you listen to Alex regularly. He spends about a third of his time on mindset and philosophy, then distributes tactical advice across marketing, finance, and operations.
The Performance Numbers
Processing 5,220 segments took 2.5 hours with a 100% success rate. That's roughly 0.8 segments per second, including API calls to OpenAI and database saves.
Cost? About $15 total for the entire pilot. That's roughly $0.003 per segment.
The system handled episodes of wildly different sizes without breaking. The smallest episode (39 segments) processed in 45 seconds. The largest (556 segments) took 12 minutes but labeled every single segment successfully.
No timeouts, no failures, no weird edge cases that broke the system.
What I'm Testing Next
This pilot proves the concept works, but I'm just getting started. Next up:
Scaling to all 915 episodes: The pilot covered 15 episodes. I want every episode in the catalog to be topic-searchable.
Building the search interface: Right now the labels exist in the database. I could build the actual search UI so I can type "show me pricing discussions" and get results. That would be nice.
Topic hierarchies: Some topics deserve sub-categories. "Sales Strategy" could break down into "Closing Techniques," "Lead Generation," "Customer Relationships." I'm not sure if I want to do this now or later.
Cross-episode synthesis: Imagine asking "compile everything Alex has said about pricing into one complete guide" and getting segments from 50+ episodes arranged logically.
Why This Matters to Me
I'm not building this because "AI-powered podcast search" sounds impressive. I'm building it because I have a specific, personal problem.
I want to find the exact moment Alex explained his framework for pricing psychology. Or that story about hiring his first salesperson. Or his thoughts on partnership decision-making.
These insights exist somewhere in 900+ hours of content. But "somewhere" isn't good enough when you need the information for a real business decision.
Now, for the first time, I can search Alex's content the way I search my own notes, by meaning and context, not by hoping I remember the right keywords.
That's the search experience I've always wanted for The Game Podcast. And after today, I'm pretty confident I can build it.
Vibe-Coding with Non Perfection
Is this perfect? No. The AI occasionally over-concentrates labels in "Business Mindset" for philosophical episodes. Some segments probably deserve multiple topic tags. And I haven't tested edge cases like episodes with multiple guests or completely different content formats.
But for a first pilot? I'm genuinely impressed. The system understood business context better than I expected. The topic distributions make intuitive sense. And most importantly, it solved the core problem, meaning making podcast content findable by meaning instead of time.
Sometimes you build something and immediately know it's going to change how you interact with the content you love. This is one of those times.
15 episodes down, 900 episodes to go.
– Benoit Meunier