Ask The Game, the Build Log

This prompt was written on July 14th, 2025, as the foundation for the pipeline insight capture project. It’s the spec I’m about to implement to help my system learn from every run — not just run.

Intent

I want to improve my ETL pipeline by making each run additive and insightful. After every run, I want to be prompted in the terminal to reflect and capture key learnings, problems, and ideas. This isn't about logging system events — it's about logging human insight while it's still fresh.

Goal

Create a CLI tool (called at the end of the pipeline) that:

  1. Asks the user a small set of reflective questions in the terminal
  2. Saves their answers in a well-structured Markdown file
  3. Stores it in a local folder like /run-insights/<run-id>.md
  4. Optionally adds a Git commit for traceability

Key Behavior

Prompt Questions

  1. What worked well in this run?
  2. What didn't work or felt fragile?
  3. What did you learn from this run? (one sentence)
  4. What should be fixed, tweaked, or refactored? (1–3 items, as checklist)
  5. What metric or signal would help you track this next time?
  6. Any new questions or hypotheses to explore next time?

Advanced Review Questions

Stage-by-Stage Diagnostics

Confidence and Drift Assessment

Customer Trust Check

Decisions, Changes, and Surprises

Metadata Snapshot (optional, auto-collected if possible)

Output Format (Markdown)

YAML Frontmatter Block

Before the questions, add a YAML frontmatter block with run metadata:

---
run_id: run-20250712-1034
timestamp: 2025-07-12 10:34
status: success
duration: 823s
num_steps: 6
episode: "Alex Hormozi  Ep. 902"
git_sha: abc123def
config_preset: production
stages_completed: ["transcription", "diarization", "embedding", "speaker_matching", "topic_segmentation", "labeling"]
audio_stats:
  total_length: "45m 23s"
  num_speakers: 2
  files_processed: 1
performance_metrics:
  stt_avg_confidence: 0.87
  speaker_match_rate: 0.94
  embedding_drift_score: 0.12
---

Markdown Structure

Example filename

/run-insights/run-20250712-1034.md

Bonus (optional)

If easy to add:

Bonus Tip: Smart .gitignore Pattern

To make sure only .md insight files are committed — and other noise like temp or backup files are ignored — add this to your .gitignore:

# Ignore everything inside run-insights except .md files
/run-insights/*
!/run-insights/*.md

This keeps your repo clean while ensuring insights stay versioned.


Implementation Plan

Technical Architecture

Core Components

  1. CLI Tool Script: scripts/capture_run_insights.py

    • Interactive prompt system using Python's input() or questionary library
    • Automation Input Layer for non-interactive modes
    • Markdown file generation with templating
    • Git integration for auto-commits
    • Run ID generation with timestamp
  2. Integration Points

    • Add call to insight capture at end of pipeline orchestrator
    • Modify src/askthegame/pipeline/orchestrator.py to call insight tool
    • Ensure run_id is passed from pipeline to insight tool
  3. Storage Structure

    run-insights/
    ├── run-20250712-1034.md
    ├── run-20250712-1245.md
    └── ...
    

Automation Input Layer

CLI Interface Design

# Interactive mode (default)
python scripts/capture_run_insights.py

# Non-interactive modes
python scripts/capture_run_insights.py --skip                    # Skip insight capture entirely
python scripts/capture_run_insights.py --non-interactive         # Use empty responses
python scripts/capture_run_insights.py --from-json insights.json # Load from JSON file
python scripts/capture_run_insights.py --batch                   # Minimal essential prompts only

# With pipeline data enrichment
python scripts/capture_run_insights.py \
  --run-id "run-20250712-1034" \
  --status "success" \
  --summary '{"episodes_processed": 5, "duration": "12m", "errors": 0}'

# Full automation example (for CI/CD)
python scripts/capture_run_insights.py \
  --non-interactive \
  --run-id "$RUN_ID" \
  --status "$PIPELINE_STATUS" \
  --summary "$PIPELINE_SUMMARY" \
  --auto-commit

JSON Input Schema

{
  "metadata": {
    "run_id": "run-20250712-1034",
    "timestamp": "2025-07-12 10:34",
    "status": "success",
    "duration": 823,
    "episode": "Alex Hormozi – Ep. 902",
    "git_sha": "abc123def",
    "config_preset": "production",
    "stages_completed": ["transcription", "diarization", "embedding", "speaker_matching", "topic_segmentation", "labeling"],
    "audio_stats": {
      "total_length": "45m 23s",
      "num_speakers": 2,
      "files_processed": 1
    },
    "performance_metrics": {
      "stt_avg_confidence": 0.87,
      "speaker_match_rate": 0.94,
      "embedding_drift_score": 0.12
    }
  },
  "basic_insights": {
    "what_worked": "Pipeline processed 5 episodes successfully",
    "what_didnt_work": "",
    "key_learning": "New confidence filtering reduced noise by 40%",
    "fixes_needed": [
      "Add timeout handling for long episodes",
      "Improve memory usage in embedding generation"
    ],
    "metrics_to_track": "Processing time per episode",
    "questions_hypotheses": "Should we batch episodes differently?"
  },
  "advanced_insights": {
    "stage_diagnostics": {
      "transcription": {"success": true, "quality": "high", "notes": "Clean audio, good confidence"},
      "diarization": {"success": true, "quality": "medium", "notes": "Some speaker overlap"},
      "embedding": {"success": true, "quality": "high", "notes": "Consistent with previous runs"},
      "speaker_matching": {"success": true, "quality": "high", "notes": "High match rate"},
      "topic_segmentation": {"success": true, "quality": "medium", "notes": "Good topic boundaries"},
      "labeling": {"success": true, "quality": "high", "notes": "Accurate speaker labels"}
    },
    "confidence_assessment": {
      "overall_confidence": "high",
      "drift_detected": false,
      "silent_failures_suspected": false
    },
    "customer_trust": {
      "shareable_with_stakeholder": true,
      "explainable_results": "fully"
    },
    "decisions_and_changes": {
      "manual_overrides": "None",
      "config_changes": "Updated timeout to 300s",
      "unexpected_occurrences": "Episode longer than usual but processed normally",
      "team_notes": "Consider batch size adjustment for long episodes"
    }
  }
}

Pipeline Data Enrichment

Auto-populate markdown with system data:

Integration Modes

Manual Mode (Interactive)

Semi-Automated Mode

Full Automation Mode

Implementation Steps

Phase 1: Core CLI Tool & Basic Questions

🎯 Goal: Working insight capture with basic questions

Phase 2: Advanced Questions & Diagnostics

🎯 Goal: ETL-specific insights and diagnostics

Phase 3: Automation Input Layer

🎯 Goal: CI/CD-ready automation

Phase 4: Pipeline Integration

🎯 Goal: Seamless pipeline integration

Phase 5: Enhancement & Intelligence

🎯 Goal: Production-ready polish

Phase 6: Insight Retrieval & Analysis

📈 HIGH PRIORITY - IMPLEMENT IMMEDIATELY AFTER PHASE 5 HIGH VALUE, LOW COMPLEXITY - Essential for making insights actionable

Quick Wins (START HERE)

Core Features

Phase 7: LLM-Powered Insight Assistant

🤖 FUTURE CONSIDERATION - TRANSFORMATIVE VALUE, HIGHER COMPLEXITY

Prerequisites (Gate Check)

Implementation Approach: Manual On-Demand (Perfect for Manual Pipeline)

# You run these manually when you want deeper analysis
./scripts/insights_assistant.py "What causes timeouts?"
./scripts/insights_assistant.py --weekly-report
./scripts/insights_assistant.py "Show me embedding drift patterns"
./scripts/insights_assistant.py --analyze-failures --last-month

Implementation Tasks (Only if prerequisites met)

NOT Included (For Manual Pipeline)

File Structure

# scripts/capture_run_insights.py
def generate_run_id() -> str:
    """Generate unique run ID with timestamp"""
    
def prompt_basic_insights() -> dict:
    """Interactive prompt for basic insights (questions 1-6)"""

def prompt_advanced_insights() -> dict:
    """Interactive prompt for advanced diagnostics and assessments"""

def collect_metadata_auto(pipeline_data: dict = None) -> dict:
    """Auto-collect metadata from pipeline data and system info"""

def load_insights_from_json(file_path: str) -> dict:
    """Load insights from JSON file for automation"""

def parse_pipeline_data(status: str, summary: str, stage_metrics: dict = None) -> dict:
    """Parse and enrich insights with comprehensive pipeline data"""

def create_yaml_frontmatter(metadata: dict) -> str:
    """Generate YAML frontmatter block for markdown file"""

def create_insight_file(run_id: str, metadata: dict, insights: dict, advanced_insights: dict = None) -> str:
    """Generate markdown file with YAML frontmatter and insights"""

def assess_run_confidence(pipeline_metrics: dict) -> dict:
    """Automatically assess run confidence based on metrics"""

def detect_drift(current_metrics: dict, historical_metrics: list) -> bool:
    """Detect drift compared to previous runs"""
    
def git_commit_insight(file_path: str, run_id: str) -> bool:
    """Auto-commit insight file to git"""

def handle_automation_mode(args) -> dict:
    """Handle non-interactive modes and data sources"""

def get_question_set(mode: str) -> list:
    """Return appropriate question set based on mode (basic/advanced/full)"""
    
def main():
    """Main entry point with CLI argument parsing and mode selection"""

# scripts/search_insights.py (Phase 6)
def search_insights(query: str, time_filter: str = None) -> list:
    """Search across all insight files for keyword/regex patterns"""

def parse_yaml_frontmatter(file_path: str) -> dict:
    """Extract structured metadata from insight files"""

def filter_by_timerange(insights: list, since: str, until: str = None) -> list:
    """Filter insights by date range"""

def summarize_patterns(insights: list, pattern_type: str) -> dict:
    """Summarize last N learnings, failures, or issues"""

def analyze_metric_trends(metric_name: str, time_range: str) -> dict:
    """Analyze trends in confidence scores, durations, etc."""

# scripts/insights_assistant.py (Phase 7)
def build_insight_embeddings(insights_dir: str) -> None:
    """Create embeddings for all insight files"""

def semantic_search(query: str, top_k: int = 5) -> list:
    """Find semantically similar insights using embeddings"""

def llm_query_insights(question: str, context_insights: list) -> str:
    """Generate natural language response using LLM over insights"""

def detect_regressions(current_metrics: dict, historical_data: list) -> dict:
    """Automatically detect performance regressions"""

def generate_insight_report(time_range: str) -> str:
    """Generate automated insight summary report"""

Future Retrieval Capabilities

Phase 6: Simple Search & Summary Examples

# Search for specific issues
./scripts/search_insights.py "timeout" --last-30-days
./scripts/search_insights.py "speaker_match_rate < 0.8" --format json

# Summary modes
./scripts/search_insights.py --failures --limit 5
./scripts/search_insights.py --learnings --since 2025-07-01
./scripts/search_insights.py --patterns "embedding_drift"

# Trend analysis
./scripts/search_insights.py --metric stt_avg_confidence --plot --timerange 7d
./scripts/search_insights.py --stage-analysis diarization --failures-only

Phase 7: LLM Assistant Examples (Manual On-Demand)

# Natural language queries (run when you need insights)
./scripts/insights_assistant.py "What causes speaker matching to fail?"
./scripts/insights_assistant.py "When do we see embedding drift?"
./scripts/insights_assistant.py "Summarize patterns in successful runs"

# On-demand reports (run weekly/monthly)
./scripts/insights_assistant.py --weekly-report
./scripts/insights_assistant.py --failure-analysis --last-month
./scripts/insights_assistant.py --recommendations --priority high
./scripts/insights_assistant.py --regression-check --since 2025-07-01

Sample Assistant Interactions

$ ./scripts/insights_assistant.py "Why do timeouts happen?"

Based on 23 runs mentioning timeouts:

**Common Patterns:**
- 65% occur with episodes >45 minutes
- 48% happen during speaker embedding stage
- 35% correlate with high speaker count (>3)

**Top Fixes Applied:**
- Increased timeout to 300s (resolved 8/12 cases)
- Batch size reduction (resolved 5/8 cases)
- Memory optimization (resolved 3/5 cases)

**Recommendation:** Consider automatic timeout scaling based on episode length.

Implementation Complexity Management

Why This Won't Overwhelm the Project

  1. Separate Tools: Each retrieval tool is independent - you can build incrementally
  2. YAML Frontmatter: Structured metadata makes search/analysis much easier
  3. Existing Libraries: Use ripgrep for search, pyyaml for parsing, standard embedding libraries
  4. Modular Design: Can implement Phase 6 without committing to Phase 7

Complexity Levels

Quick Win Strategy (START HERE)

Create these 5-minute scripts immediately after Phase 1:

# scripts/quick_search.sh - 20-line bash script
grep -r "timeout" run-insights/ | head -5
grep -r "failure" run-insights/ | head -5
grep -r "embedding_drift" run-insights/ | head -5
grep -r "speaker_match_rate" run-insights/ | head -5

Why this matters: Without retrieval, insights become write-only. These simple scripts make Phase 1-5 immediately more valuable.

Technical Dependencies

Integration Points

Orchestrator Modification

# In src/askthegame/pipeline/orchestrator.py
def run_pipeline(...):
    run_id = generate_run_id()
    start_time = time.time()
    stage_metrics = {}
    
    # ... existing pipeline logic with stage tracking ...
    
    # Track each stage
    for stage_name in ["transcription", "diarization", "embedding", "speaker_matching", "topic_segmentation", "labeling"]:
        stage_start = time.time()
        stage_success, stage_metrics[stage_name] = run_stage(stage_name, ...)
        stage_metrics[stage_name].update({
            "duration": time.time() - stage_start,
            "success": stage_success,
            "timestamp": datetime.now().isoformat()
        })
    
    # Collect comprehensive run metrics
    run_summary = {
        "episodes_processed": episodes_count,
        "duration": time.time() - start_time,
        "errors": error_count,
        "stages_completed": [stage for stage, metrics in stage_metrics.items() if metrics.get("success", False)],
        "git_sha": get_git_sha(),
        "config_preset": get_config_preset(),
        "audio_stats": {
            "total_length": format_duration(total_audio_length),
            "num_speakers": detected_speakers,
            "files_processed": len(processed_files)
        },
        "performance_metrics": {
            "stt_avg_confidence": calculate_avg_confidence(transcription_results),
            "speaker_match_rate": calculate_speaker_match_rate(speaker_results),
            "embedding_drift_score": calculate_embedding_drift(embedding_results)
        }
    }
    
    # At completion (success or failure)
    if should_capture_insights():
        capture_run_insights(
            run_id=run_id,
            status="success" if success else "failed", 
            summary=json.dumps(run_summary),
            stage_metrics=json.dumps(stage_metrics),
            interactive=not is_ci_environment(),
            episode_title=get_episode_title()
        )

Environment Configuration

Standard Configuration

Automation Configuration

CI/CD Integration Examples

# GitHub Actions
- name: Run Pipeline with Insights
  run: |
    python -m askthegame.pipeline.orchestrator
    python scripts/capture_run_insights.py \
      --non-interactive \
      --status "${{ job.status }}" \
      --auto-commit
  env:
    RUN_INSIGHTS_ENABLED: true
    RUN_INSIGHTS_CI_MODE: true

Success Criteria

Phase 1-5: Core Functionality (MUST HAVE)

Phase 6: Retrieval & Search (HIGH PRIORITY)

Phase 7: LLM Intelligence (FUTURE CONSIDERATION)

What People Wish They Had Asked Earlier

These are common regrets from experienced ETL teams — areas where better questions before or after a run would have prevented silent failures or poor results.

🔎 Validation regrets

🕵️ Visibility regrets

⌛ Performance regrets