Pushing Things Forward (Even If I Still Can’t Match Speakers Yet)
I can’t wait to tackle speaker matching properly. That part’s gonna be so fun. But like everything else, it’s always: “just one more thing to fix first…”
This last commit? It was all about resilience, speed, and data sanity.
A Friendly AI Rivalry
Quick detour. I threw a bunch of Python scripts to Claude and asked: “Can you do better?”
Claude gave some solid ideas. So I passed them on to Gemini and said, “Your move. Beat your competition.”
Gemini took the challenge. A few tweaks later, better code—not perfect, but sharper. Collaborative AI dev is way more fun than staring at bugs alone.
What’s New Under the Hood
Error Handling and Resilience
- Episodes now only get marked as processed after everything completes successfully. No more false positives.
Added retries with exponential backoff for Deepgram hiccups. If the API flinches, we try again gracefully.
Fixed a sneaky bug where failed runs were flagged as success. That’s handled now.
The main loop is wrapped in a safe try/except so one busted episode doesn’t crash the whole run.
Performance Boosts
Database inserts for chunks and entities now go in batches. This avoids rate limits and makes everything much faster.
Better Data, Cleaner Logic
- Entity names are normalized to lowercase before lookups. No more “Alex” vs “alex” mix-ups.
- If the episode number is missing from the RSS tag, we extract it from the title. Smart fallback.
- Skip logs are richer now with URL, date, and all, without touching the original title.
Observability for Future Me
- Print statements are gone. Replaced with proper logging.
- Each run writes a timestamped log to the logs folder. Super helpful in debugging or just feeling like a real engineer.
- There’s even a new validate_data.py script to spot issues after a batch run. QA gets its own spotlight.
API Modernization
- Switched Deepgram to use the updated listen. Rest method. That got rid of those annoying UnsupportedWarning messages.
Little by little, this is becoming a real pipeline. Not just clever, but reliable. The kind of foundation I’ll need if I want to take on speaker labelling next.
Let’s get back to it.
– Benoit Meunier