Failing to Prototype Extracting Alex Hormozi’s Pricing Quotes

01 Jul, 2025

You ever build something that works perfectly and still feels like a failure?

That’s what happened here. I built a prototype to extract pricing insights from Alex Hormozi’s podcast using my structured pipeline. And technically, it worked. Beautifully.

But the result? Not quite what I hoped for.

And honestly, that's exactly what I needed.

The Intent

The goal was pretty clear.

I wanted to answer the question: What does Alex Hormozi actually say about pricing strategy?

Not “What does ChatGPT guess he said.”
Not “Where does he mention the word ‘price.’”
But: What are the actual ideas, expressed by Alex, in the real audio, with timestamped precision.

I built a system that transcribes the podcast, identifies who’s speaking, breaks it into clean chunks, and stores everything in a structured, queryable database.

Then I asked that system, through a prompt, to surface only the parts that were truly about pricing. For about 3 episodes.

What came back?
A few useful gems.
A lot of noise.

And a big realization.

The Prototype That Worked ... and Missed

You can try it for yourself here:
👉 Pricing Strategy Prototype

The UI works great. Click a card. Listen to the audio. Read a short summary. It's focused on one topic: "pricing" from :

How to Price High-Ticket Without Fear | Ep 903
Brutally Honest Advice For Hard Days | Ep 905
We Made A BIG Decision… | Ep 908

Some of the quotes weren’t about pricing strategy at all.
They just mentioned pricing words.

“I have so much money. What do I do with it?”
Yeah. That’s not pricing insight.

The Problem With Keyword Matching

At first, I filtered content using basic keyword logic. You know, stuff like:

const pricingChunks = chunks.filter(chunk => 
  pricingKeywords.some(kw => kw.regex.test(chunk.chunk_text))
);

It catches mentions. But not meaning.

Just because Alex says “value” or “cost” doesn’t mean he's talking about pricing strategy.

I wasn't surfacing real insight. I was surfacing search results.

One Potential Fix: Topic Segmentation

That’s why I'm now building a topic segmentation layer into the pipeline.

Instead of checking for keywords, we’ll classify each chunk based on what it’s actually about.

Pricing strategy. Offer positioning. Team building. Sales psychology.

Whatever topic I want, I can train the system to find it semantically, not syntactically.

Here's what it will look like under the hood:

const pricingChunks = chunks.filter(chunk => 
  chunk.topic_classification === 'pricing_strategy' ||
  chunk.topic_classification === 'value_pricing' ||
  chunk.topic_classification === 'premium_positioning'
);

Much cleaner. Much smarter.
And way closer to the kind of system I actually want to build.

Dumb Search Engine

The real failure here wasn’t technical.

It’s that what I built was basically a dumb search engine, not a real tool for exploring ideas.

That’s the bigger goal.

I'm not just trying to surface content. I'm trying to build a system that understands it. That lets people navigate podcast content by theme, by insight, by idea.

Not just “Where did Alex say the word ‘price’?”
But “What does Alex believe about pricing — and how has it evolved across episodes?”

So Why Not Just Use ChatGPT?

Because I don’t want a guessing engine.

ChatGPT is great at sounding smart. But it doesn’t remember what was said, where, or by whom with raw transcripts.

My system does.

Because I control the data, the speaker IDs, the segmentation, and the context, I can layer an LLM on top without hallucinations.

I don’t want a vibe.
I want structure, evidence, and clarity.

And now that the pipeline exists, adding a reasoning layer, such as a grounded GPT agent (perhaps), will actually mean something.

It’ll answer questions from a clean dataset tied to real audio, with metadata and speaker attribution.

That’s not just AI.
That’s usable AI.

What This Prototype Taught Me

It taught me where things fall apart.
And why that’s exactly where to build next.

Topic segmentation is now a must-have, not a maybe.
LLMs should follow structure, not precede it.
Keyword search is not enough.
You can't extract insight from an unstructured mess.

I’m proud of what we built, even if it didn’t do what we hoped.

Because now I know what to fix, and how to fix it.

What’s Next

Next up is semantic topic classification.
Once that’s in place, this system can finally move from clever search to actual understanding.

After that?

I’ll test LLMs that can reason over this data safely.

I'll build microsites around ideas, not just episodes.

And eventually, maybe we’ll provide a way for you to ask your own questions and receive genuine answers from the content itself.

– Benoit Meunier