Skip to content

Beyond RAG: How Andrej Karpathy's LLM Wiki Pattern Builds Knowledge That Actually Compounds

Summary

An in-depth technical analysis by Plaban Nayak explaining the LLM Wiki pattern's architecture, implementation details, advantages, disadvantages, and implications for AI engineers. Includes a full Python implementation reference with CLI pipeline.

Key Insights

The Compilation Analogy

The core insight is framed as software compilation: RAG executes source code on every request (re-reads, re-chunks, re-synthesizes), while LLM Wiki compiles knowledge once into an optimized artifact (wiki pages) that benefits every subsequent query.

Three-Layer Architecture

  1. Raw Sources (Immutable)sources/ directory, never modified by LLM, serves as audit trail and ground truth
  2. The Wiki (LLM-Maintained)*.md files with YAML frontmatter, [[slug]] cross-references, index.md, log.md, and .meta/embeddings.json
  3. The Schema (Governance) — JSON file defining the "page universe" (slugs, titles, descriptions), the only human-managed component

Four Core Operations

  • Init — Bootstrap directory structure from schema template
  • Ingest — 5-step pipeline: Resolve Source → Route → Synthesize → Embed → Update Index/Log
  • Query — RAG over compiled wiki pages: Embed question → Cosine similarity → Assemble context → Stream answer
  • Lint — Structural checks: orphaned pages, missing pages, broken cross-references, stale embeddings; --deep for contradiction analysis, --fix to regenerate embeddings

Query Templates (6 Categories)

  • Synthesis — integrated understanding from all pages
  • Gap-finding — identify missing topics
  • Debate — surface tensions and disagreements between sources
  • Output — produce artifacts (study guides, cheat sheets, slide decks)
  • Health — audit the wiki itself
  • Personal application — connect wiki knowledge to specific situations

Implementation Architecture

Python package with clear module boundaries: embeddings.py, index.py, wiki.py, prompts.py, ingest.py, query.py, lint.py, cli.py. Uses OpenAI text-embedding-3-small and Claude API with prompt caching (~90% cost reduction on repeated calls).

Advantages

  • Knowledge compounds over time; cross-referential density increases with scale
  • Queries get faster and cheaper as wiki matures (synthesis at ingest time, not query time)
  • Human-readable, version-controllable via git
  • Explicit provenance tracking on every page
  • Source type agnostic (files, YouTube, web pages)
  • Cost-efficient at scale with prompt caching and routing

Disadvantages

  • Ingest is slow/expensive for large corpora (~100-200 Claude API calls for 50 papers × 10 pages)
  • Synthesis quality is LLM-dependent; errors persist until lint catches them
  • Schema design is non-trivial work; poor schema leads to over/under-routing
  • Flat-file embedding index effective only up to ~500 pages (beyond that needs FAISS/ChromaDB)
  • Stale page risk if routing misses relevance
  • No real-time knowledge support
  • --save loop requires manual judgment on what's worth preserving

Implications for AI Engineers

  • LLM Wiki is a memory architecture for agents — persistent, structured, self-maintaining
  • Maps naturally onto MCP (Model Context Protocol) as tool calls (wiki_search, wiki_ingest, wiki_lint)
  • Recasts RAG from question-answering to knowledge management
  • Practical solution to context window limits and LLM statelessness
  • Explicitly human-in-the-loop design
  • Scales from personal to team/enterprise