Beyond RAG: How Andrej Karpathy's LLM Wiki Pattern Builds Knowledge That Actually Compounds¶

Summary¶

An in-depth technical analysis by Plaban Nayak explaining the LLM Wiki pattern's architecture, implementation details, advantages, disadvantages, and implications for AI engineers. Includes a full Python implementation reference with CLI pipeline.

Key Insights¶

The Compilation Analogy¶

The core insight is framed as software compilation: RAG executes source code on every request (re-reads, re-chunks, re-synthesizes), while LLM Wiki compiles knowledge once into an optimized artifact (wiki pages) that benefits every subsequent query.

Three-Layer Architecture¶

Raw Sources (Immutable) — sources/ directory, never modified by LLM, serves as audit trail and ground truth
The Wiki (LLM-Maintained) — *.md files with YAML frontmatter, [[slug]] cross-references, index.md, log.md, and .meta/embeddings.json
The Schema (Governance) — JSON file defining the "page universe" (slugs, titles, descriptions), the only human-managed component

Four Core Operations¶

Init — Bootstrap directory structure from schema template
Ingest — 5-step pipeline: Resolve Source → Route → Synthesize → Embed → Update Index/Log
Query — RAG over compiled wiki pages: Embed question → Cosine similarity → Assemble context → Stream answer
Lint — Structural checks: orphaned pages, missing pages, broken cross-references, stale embeddings; --deep for contradiction analysis, --fix to regenerate embeddings

Query Templates (6 Categories)¶

Synthesis — integrated understanding from all pages
Gap-finding — identify missing topics
Debate — surface tensions and disagreements between sources
Output — produce artifacts (study guides, cheat sheets, slide decks)
Health — audit the wiki itself
Personal application — connect wiki knowledge to specific situations

Implementation Architecture¶

Python package with clear module boundaries: embeddings.py, index.py, wiki.py, prompts.py, ingest.py, query.py, lint.py, cli.py. Uses OpenAI text-embedding-3-small and Claude API with prompt caching (~90% cost reduction on repeated calls).

Advantages¶

Knowledge compounds over time; cross-referential density increases with scale
Queries get faster and cheaper as wiki matures (synthesis at ingest time, not query time)
Human-readable, version-controllable via git
Explicit provenance tracking on every page
Source type agnostic (files, YouTube, web pages)
Cost-efficient at scale with prompt caching and routing

Disadvantages¶

Ingest is slow/expensive for large corpora (~100-200 Claude API calls for 50 papers × 10 pages)
Synthesis quality is LLM-dependent; errors persist until lint catches them
Schema design is non-trivial work; poor schema leads to over/under-routing
Flat-file embedding index effective only up to ~500 pages (beyond that needs FAISS/ChromaDB)
Stale page risk if routing misses relevance
No real-time knowledge support
--save loop requires manual judgment on what's worth preserving

Implications for AI Engineers¶

LLM Wiki is a memory architecture for agents — persistent, structured, self-maintaining
Maps naturally onto MCP (Model Context Protocol) as tool calls (wiki_search, wiki_ingest, wiki_lint)
Recasts RAG from question-answering to knowledge management
Practical solution to context window limits and LLM statelessness
Explicitly human-in-the-loop design
Scales from personal to team/enterprise