I Built the Voice of Hormozi

18 Jun, 2025

Today's goal was clear: to create the most critical asset for the new speaker ID feature, a high-fidelity voiceprint of Alex Hormozi.

I’m not talking about just grabbing a soundbite. I mean creating a mathematical fingerprint, essentially a digital signature that my system can use to confidently verify, “Yes, that’s Alex.”

Here’s how I achieved that.

I Chose to Build, Not Rent

My first major decision was to build my speaker identification system from scratch. I wanted to avoid relying on a commercial API that could change pricing, restrict access, or disappear altogether. (I'm looking at you Microsoft Azure's Speaker Recognition API.)

To achieve this, I used open-source tools: pyannote.audio and speechbrain.

This choice grants me complete control over the system. It eliminates recurring costs and lays the foundation for something long-term, something I truly own.

While it may not be the fastest route, it is definitely good.

Step 2: I Fought the Environment ... and Won

Every machine learning (ML) project begins the same way: with dependency challenges.

I spent a significant amount of time establishing a clean and stable environment. I used pyenv to lock in Python 3.11, ensuring that everything from Torch to Torchaudio would work together without any surprises.

Was it painful? A little. A lot. I'm not a coder.

But it was worth it. Now I have a reliable foundation that I can trust.

Step 3: I Enrolled Hormozi’s Voice

After completing the initial setup, I created a small utility script called `create_voiceprint.py`.

This script aims to take a clean sample of Alex's speech and convert it into a voiceprint.

I used the file `alex_sample.mp3`, processed it through the SpeechBrain model, and generated a compact, high-dimensional vector. This output file, `alex_voiceprint.bin`, now serves as my ground truth.

From this point forward, any unknown speaker will be compared against this voiceprint. If there is a match, we can confirm the speaker is Alex.

It's working. What's next?*

This is a significant foundational win. The setup is complete, the core asset is built, and the voiceprint is ready. The pipeline is functioning smoothly.

Now, it’s time to connect the two. My next move is to integrate the speaker ID logic into the main script, so the system doesn’t just process episodes; it knows who is speaking.

– Benoit Meunier