How I Built Self-Updating Documentation with GitIngest

03 Jul, 2025

I was scrolling through Twitter when I saw Wes Bos, a fellow Canadian, asking his followers for good AI coding tips. I started reading the replies to see what cool tools people are sharing.

That's when I spotted @travisirby's reply: "[...] take a look at https://gitingest.com/"

I clicked the link out of curiosity. GitIngest? Never heard of it. But the tagline caught my attention: "Turn any Git repository into a prompt-ready text digest."

Huh. That's... actually interesting.

This project had been growing like crazy. Speaker identification, topic segmentation, cloud deployment, confidence filtering... it had become this complex beast that actually works. But every time I looked at the README, I'd cringe.

The documentation was not always correct. And it was missing something for LLMs to really grasp what my codebase was about.

Every time I added a new feature or fixed a bug, I'd tell myself, "I'll update the README later." Later never came. The documentation was either missing or straight-up lying about what the code actually did.

My Documentation Was Living in the Past

I'd build something cool, like when I finally got the semantic topic segmentation working and could identify when Alex Hormozi switches from talking about business strategy to personal stories. I'd be so excited about the breakthrough that I'd immediately start on the next feature.

The README.md? Still talking about the old system that barely worked.

The project structure docs? Showing directories that didn't exist anymore.

The setup instructions? Good luck with that.

I tried the usual fixes. Set reminders. Made myself a checklist. Even tried to document as I went. Nothing was really standardized. The moment I entered a flow state, documentation became the last thing on my mind.

But I know I need good documentation. First, it's my learning exercise. When I'm documenting, I understand things better. I think I have great documentation already compared to other projects I'm working on, but it was not enough. And because, as a non-dev, I forget how my own code works. When I'm deep in the zone, fixing a bug, I struggle to recall how the confidence filtering system relates to topic segmentation. And if I can't remember, how is anyone else supposed to understand it?

What If Documentation Updated Itself?

So I went back to that GitIngest link from Travis's reply. I started reading more about what it actually does. The idea is simple yet clever: it takes your entire codebase and converts it into a single, massive text file that AI can actually comprehend.

Not just file names and directories, but the actual code, configuration files, data structures ... everything. It creates what they call a "project digest."

I thought, "What if I could use this to flip my documentation problem? Instead of trying to remember to update documentation, what if the documentation just... stayed current automatically?"

Worth a shot, right?

I tried it on this project. The result was mind-blowing: a 145MB text file that contained literally everything about my project in a format that AI could analyze and understand.

Here's what happened when I ran it:

gitingest . -o askthegame_digest.txt
# Analysis complete! 
# Files analyzed: 399
# Estimated tokens: 41.5M

41.5 million tokens. That's my entire project - every Python file, every configuration, every data structure - all in one digestible format.

Using AI for the Documentation

Here's how I set it up. I created a documentation generation script that:

Runs GitIngest to create a complete project digest
Analyzes the digest to extract key information about the project structure
Generates human-readable documentation based on what the code actually does
Updates the README.md with current, accurate information

Instead of me trying to remember what my confidence filtering system does, the AI reads through all the code and figures out:

What each module actually does
How the data flows between components
What the current project structure looks like
What features are production-ready vs experimental

That's what AI is like: an assistant who reads your entire codebase and writes documentation for you. Except this assistant never gets tired, never forgets, and always has the most current information.

What can I do with it?

For Development

Every time I commit code, my pre-commit hooks automatically regenerate the documentation. I added a new API endpoint for processing episodes. The documentation updates itself. I changed how speaker embeddings are stored. The architecture diagrams reflect the new structure.

For Onboarding

Anyone (including future me) can review the README and understand what the project does. The documentation shows the real project structure, not what I thought it was six months ago.

For AI Analysis

That 145MB digest file becomes incredibly useful for other AI tools. I can feed it to Claude or ChatGPT and ask questions like "How does the confidence filtering connect to the topic segmentation?" The AI has the complete context of my entire project.

For Project Management

The system tracks what is production-ready versus experimental. My current README clearly shows that semantic topic segmentation is production-ready (it processes real episodes), while advanced content analytics is still experimental.

Here's a real example. My current README says:

Data Flow

Audio ingestion → Speaker identification

Content analysis → Topic segmentation

Embedding generation → Storage

Quality filtering → Output generation

This is accurate as of today. It reflects the actual pipeline that processes Alex Hormozi's The Game podcast episodes. But if I change the pipeline tomorrow, the documentation will update automatically.

It's integrated

I integrated it into my entire development workflow:

Pre-commit Hooks: Every time I commit code, the documentation regenerates if needed. I never have to remember to update it.
CI/CD Pipeline: When I push to GitHub, the documentation gets validated and updated. Pull requests automatically include documentation previews.
Development Setup: New developers (okay, still just me) run one command: python scripts/setup_hooks.py and everything is configured.

The system even handles the boring stuff. It updates the .gitignore file to exclude the massive digest files from version control. It creates proper directory structures. It follows industry standards for documentation organization.

What I'm Still Figuring Out

I'm not going to pretend this is perfect. There are still things I'm experimenting with:

Content Quality

The AI-generated documentation is really good, but sometimes it misses context that I think is important. I'm still tweaking the analysis prompts to get better results, or simply reviewing the documentation manually.

Performance

Generating a 145MB digest file takes time. I'm not sure if optimizing this so it doesn't slow down my development workflow is worth it. It's working. But, something that may bit me in the future.

Selective Updates

Right now, it regenerates everything when code changes. It's potentially where it could break up. Maybe I could find a way to update only the section that actually need changes? I'm not sure.

Why it Matters for Me

I was trying to solve it for myself and my specific project.

However, the impact has been huge on my mindset, particularly in how I view developing and scaling. I look forward to reviewing my documentation now, as it's accurate.

When I'm debugging something, I can trust that the README reflects how the system actually works. And when I'll need help later, humans will be able to dissect it and understand what I do right, what to be inspired about, and where I fucked up because I'm vibe coding this.

And when I have breakthrough moments, like when I figured out how to get semantic topic segmentation working across different types of episodes, I don't have to choose between documenting the breakthrough or building on it. The documentation updates itself.

The key insight for me was that I needed to stop trying to remember to update documentation and start making it impossible for the documentation to be wrong.

– Benoit Meunier

Thanks to @travisirby for the GitIngest recommendation in that Twitter thread - sometimes the best discoveries happen when you're not even looking for them.