I Built a Chunk Merger Validator

01 Jul, 2025

If you're reading this, you probably know I'm working on a big audio pipeline project called Ask The Game. It breaks podcast episodes into manageable chunks, allowing me to search, quote, and understand them more effectively with the help of AI.

However, I recently encountered a problem.

Even though my system was great at figuring out who said what, it was... kinda messy. Like, way too many chunks. Weird speaker switches. Some junk in the middle. It wasn’t broken, but it wasn’t clean either.

So I built something new.

Let me explain what it does, why it’s important, and how it helped me fix a major part of my pipeline.

What’s a Chunk Merger Validator?

Imagine listening to a podcast and every few seconds someone hits pause. You’d go nuts, right?

That’s similar to what my last iteration did. It split the conversation into too many small pieces, even when it was clearly the same person speaking.

The Chunk Merger Validator is like a smart editor. It goes in after everything’s been transcribed and labelled, and says:

"Hey, this part sounds like the same person talking — let’s just group that together."

It also watches out for confusing content, such as when speaker labels bounce back and forth too quickly. If someone’s labelled “Alex → Guest → Alex” in 10 seconds? That’s probably a mistake.

This validator cleans that all up. It's the final polish before a transcript is saved and used for any other purpose.

Why It Matters

Here’s why this step is so important:

The cleaner the chunks, the easier it is for the AI to:

Understand who said what
Pull out great quotes
Build memories of speakers over time

It’s like cleaning up your LEGO blocks before building something cool. Fewer pieces, better shapes, and everything fits together nicely.

What the Validator Actually Does

It works in a few ways:

Merges short gaps: If the same speaker talks, takes a tiny pause (like to breathe), then keeps going, the validator merges those chunks into one.
Flags speaker flip-flops: It spots when the speaker labels bounce around too much in a short time, especially if the system wasn’t confident about them.
Tracks confidence: Each chunk has a score of how sure the system was about what it heard. Low-confidence bits get handled with care.
Runs in dry-run mode: Before it touches anything, it can do a test run. If it's about to make a big mess, it stops and keeps the original version.
Shows its work: The validator creates a little report after each episode. It tells us how many chunks did it created, how confident is it, if it found anything weird and long did it took.

Example:

{
  "episode_title": "How to Survive Creator Burnout",
  "chunks_created": 472,
  "avg_confidence": 0.94,
  "merge_rate": 0.62,
  "flipflops_detected": 0,
  "processing_time_sec": 45.95
}

I can see what’s working, what needs tweaking, and how clean each episode is.

It worked!

I ran the new validator on two fresh podcast episodes:

Chunks dropped from 150 to 60 in one case — 60% fewer!
Confidence stayed high (above 0.93)
No speaker flip-flops
Everything is processed in just a few seconds

And nothing broke.

Now, every new episode gets automatically cleaned up and ready for search, memory, and quote extraction.

Until next time!

– Benoit Meunier