Beyond RAG: How Andrej Karpathy's LLM Wiki Pattern Builds Knowledge That Actually Compounds¶
Summary¶
An in-depth technical analysis by Plaban Nayak explaining the LLM Wiki pattern's architecture, implementation details, advantages, disadvantages, and implications for AI engineers. Includes a full Python implementation reference with CLI pipeline.
Key Insights¶
The Compilation Analogy¶
The core insight is framed as software compilation: RAG executes source code on every request (re-reads, re-chunks, re-synthesizes), while LLM Wiki compiles knowledge once into an optimized artifact (wiki pages) that benefits every subsequent query.
Three-Layer Architecture¶
- Raw Sources (Immutable) —
sources/directory, never modified by LLM, serves as audit trail and ground truth - The Wiki (LLM-Maintained) —
*.mdfiles with YAML frontmatter,[[slug]]cross-references,index.md,log.md, and.meta/embeddings.json - The Schema (Governance) — JSON file defining the "page universe" (slugs, titles, descriptions), the only human-managed component
Four Core Operations¶
- Init — Bootstrap directory structure from schema template
- Ingest — 5-step pipeline: Resolve Source → Route → Synthesize → Embed → Update Index/Log
- Query — RAG over compiled wiki pages: Embed question → Cosine similarity → Assemble context → Stream answer
- Lint — Structural checks: orphaned pages, missing pages, broken cross-references, stale embeddings;
--deepfor contradiction analysis,--fixto regenerate embeddings
Query Templates (6 Categories)¶
- Synthesis — integrated understanding from all pages
- Gap-finding — identify missing topics
- Debate — surface tensions and disagreements between sources
- Output — produce artifacts (study guides, cheat sheets, slide decks)
- Health — audit the wiki itself
- Personal application — connect wiki knowledge to specific situations
Implementation Architecture¶
Python package with clear module boundaries: embeddings.py, index.py, wiki.py, prompts.py, ingest.py, query.py, lint.py, cli.py. Uses OpenAI text-embedding-3-small and Claude API with prompt caching (~90% cost reduction on repeated calls).
Advantages¶
- Knowledge compounds over time; cross-referential density increases with scale
- Queries get faster and cheaper as wiki matures (synthesis at ingest time, not query time)
- Human-readable, version-controllable via git
- Explicit provenance tracking on every page
- Source type agnostic (files, YouTube, web pages)
- Cost-efficient at scale with prompt caching and routing
Disadvantages¶
- Ingest is slow/expensive for large corpora (~100-200 Claude API calls for 50 papers × 10 pages)
- Synthesis quality is LLM-dependent; errors persist until lint catches them
- Schema design is non-trivial work; poor schema leads to over/under-routing
- Flat-file embedding index effective only up to ~500 pages (beyond that needs FAISS/ChromaDB)
- Stale page risk if routing misses relevance
- No real-time knowledge support
--saveloop requires manual judgment on what's worth preserving
Implications for AI Engineers¶
- LLM Wiki is a memory architecture for agents — persistent, structured, self-maintaining
- Maps naturally onto MCP (Model Context Protocol) as tool calls (
wiki_search,wiki_ingest,wiki_lint) - Recasts RAG from question-answering to knowledge management
- Practical solution to context window limits and LLM statelessness
- Explicitly human-in-the-loop design
- Scales from personal to team/enterprise