Ask The Game, the Build Log

I Just Wanted Better Logs. Now I Have a Whole New Architecture.

I started with wanting a better logging, then I just sent OCD and refactored the file architecture, and it was fun.

Better Logging

The projectโ€™s running. Things are working. But something felt... foggy. I wasn't able to see what's going on behind the scenes. So, I simply try to add structured logging so I can observe what's going on.

At the time, I had a massive main.py script with over 1,800 lines of code. It did diarization, embeddings, speaker ID, all of it jammed together.

I started out just wanting better logs. But once you start thinking in terms of observability, you canโ€™t unsee whatโ€™s missing.

I needed:

I discussed it with ChatGPT and Claude, and both proposed swapping out loose log lines for structured JSON.

# Old way
logging.info("Diarization found some speakers")

# New way
structured_logger.log_segmentation(
    step="diarization_complete",
    speakers_detected=3,
    processing_time=45.2,
    memory_efficient=True,
    details={"speaker_labels": ["SPEAKER_00", "SPEAKER_01", "SPEAKER_02"]}
)

And the result is that I can now query the logs directly. It's similar to what I did by converting raw transcripts into fully structured data. Now my logs are structured and machine-readable. They can be analysed, filtered, or acted upon.

{
  "run_id": "20250626_172050",
  "episode_name": "Episode Title",
  "segmentation_logs": [...],
  "ecapa_logs": [...],
  "cluster_logs": [...],
  "success": true,
  "total_processing_time_seconds": 65.3
}

And thatโ€™s when it hit me: let's do more cleanup.

Total File Restructure

After asking around, I found out there's no one-size-fits-all standard for structuring files. However, the structure I proposed closely follow best practices from multiple recognized Python conventions, especially when building modular, data-driven projects:

Companies like OpenAI, HuggingFace, Meta, and Netflix often use a modular ML pipeline approach in their projects. So, why shouldn't I?

And ChatGPT also shared some extra suggestions:

"Use src/ for all reusable logic. Put CLI entry points in scripts/. Migrate config to configs/."

So before:

askthegame/
โ”œโ”€โ”€ main.py (1800+ lines)
โ”œโ”€โ”€ debug_chunks.py
โ”œโ”€โ”€ create_voiceprint.py
โ”œโ”€โ”€ temp_audio/
โ”œโ”€โ”€ logs/
โ”œโ”€โ”€ deepgram_backups/
โ””โ”€โ”€ chaos everywhere

... and after.

askthegame/
โ”œโ”€โ”€ src/askthegame/
โ”‚   โ”œโ”€โ”€ audio/         
โ”‚   โ”œโ”€โ”€ transcription/ 
โ”‚   โ”œโ”€โ”€ speaker/       
โ”‚   โ”œโ”€โ”€ embeddings/    
โ”‚   โ”œโ”€โ”€ database/      
โ”‚   โ”œโ”€โ”€ pipeline/      
โ”‚   โ””โ”€โ”€ utils/         
โ”œโ”€โ”€ scripts/
โ”œโ”€โ”€ configs/
โ”œโ”€โ”€ data/
โ”œโ”€โ”€ tests/
โ””โ”€โ”€ docs/

Now everything has a place, and more importantly, it has a purpose.

Modular Design = Sanity

Anyway, I was lost. So many files. Could I break up the 1800-line file? It seemed like cleaning up a cluttered garage. And so I did with some AI help.

Each module now does exactly one thing. Nothing more.

And the CLI? Still works:

python scripts/run_pipeline.py --target-episode "Ep 908"

And because I developed habits, I kept the legacy main.py there. Backward-compatible. However, I made a note to remove it if I'm no longer using it.

Configuration? YAML All the Way

Hardcoded variables just don't sit right with me. What if I need to make a change? Plus, I've heard from developers in the past that hardcoded variables aren't ideal. So, I've made a change.

Before:

TARGET_EPISODE_TITLE = "Ep 908"

After:

# pipeline_config.yaml
target_episode_title: "Ep 908"
max_episodes_per_run: 1

Flexible. Clear. Shareable.

Always Be Documenting

During my vibe coding sessions, I've noticed comments at the top of files created by Claude Code. I researched and found that I can benefit from using module-level docstrings in my Python scripts.

So, I reviewed every file and wrote docstrings as I would want to read them six months from now. Not just notes, but proper, developer-grade documentation.

"""
Filename: rss_processor.py

Description:
    RSS feed processor for The Game Podcast ETL.
    Handles metadata extraction and filtering.

Author: Benoรฎt Meunier
Created: 2025-06-26
"""

Every module. Every script. I really love documentation. And because I'm vibe coding, it's an exercise that is helping me understand what each thing does.

The Unexpected Win

What started as a logging improvement ended up transforming everything and documenting everything. The original code still runs; nothing broke. But now, everythingโ€™s just... better.

Clean, documented, modular, understandable.

โ€“ Benoit Meunier