LLM Wiki Pattern¶

Summary¶

A pattern for building personal knowledge bases where an LLM incrementally builds and maintains a persistent, interlinked wiki of markdown files from raw source documents, rather than retrieving from raw documents at query time like RAG.

Core Idea¶

Instead of stateless retrieval (RAG), the LLM Wiki pattern uses stateful, compounding knowledge. When a new source is added, the LLM doesn't just index it — it reads it, extracts key information, and integrates it into the existing wiki by:

Updating existing pages with new information
Creating new entity pages for first-time concepts
Adding [[wiki-links]] connecting concepts
Flagging contradictions between new and existing knowledge

The compiled artifact is a set of human-readable, LLM-maintained markdown files — one per concept, with cross-references, provenance tracking, and version history via git.

Key Difference from RAG¶

	RAG	LLM Wiki
Knowledge persistence	None — stateless	Full — builds over time
Synthesis timing	Per query, from scratch	Pre-compiled at ingest time
Contradiction detection	No	Yes — flagged during compilation
Multi-document answers	Retrieved chunks pieced together	Pre-synthesized encyclopedia entries

The analogy is software compilation: RAG executes source code on every request; LLM Wiki compiles knowledge once into an optimized artifact that benefits every subsequent query.

Origins¶

Proposed by Andrej Karpathy in a GitHub Gist in April 2026. The post went viral in the developer community within days. Karpathy's own wiki reached ~100 articles and ~500,000 words while remaining navigable by the LLM using index and summaries. He noted: "I thought I had to reach for fancy RAG, but the LLM has been pretty good about automaintaining index files."

Architecture¶

Three cleanly separated layers:

Raw Sources (Immutable) — Human-curated source documents; LLM reads but never modifies; serves as audit trail
The Wiki (LLM-Maintained) — Interlinked markdown pages with YAML frontmatter; index.md catalog; log.md activity log
The Schema (Governance) — Human-defined page universe (slugs, titles, descriptions); the contract between human intent and LLM execution

Core Operations¶

Ingest Pipeline — Read source, route to relevant pages, synthesize updates, embed, update index/log
Query Pipeline — RAG over compiled wiki pages instead of raw chunks
Lint Operation — Health checks: orphans, broken links, contradictions, stale content

Extensions and Adaptations¶

Internal Data (Codebase Memory)¶

Cole Medin adapted the pattern from external data (articles, papers) to internal data — giving Claude Code a memory that evolves with a codebase. Instead of ingesting web content, the system captures conversation logs via Claude Code Hooks (session start, pre-compact, session end) and extracts structured knowledge articles from them.

Hot Cache¶

Nate Herk and Cole Medin introduced a hot.md file — a ~500-character cache of the most recent conversation context. Useful for agents that need quick context without crawling full wiki pages (e.g., executive assistants). Hot Cache

Flat vs. Structured Wiki¶

Karpathy noted: "Sometimes I like to keep it really simple and really flat" — no subfolders, no over-organizing. Some implementations (like Cole's YouTube wiki) use subfolders (analysis, concepts, entities, sources) which makes more sense for certain use cases.

Token Efficiency¶

One user reported turning 383 scattered files and 100+ meeting transcripts into a compact wiki, dropping token usage by 95% when querying with Claude. The wiki eliminates the need to re-read raw documents on every query.

Advantages¶

Knowledge compounds: each source enriches the same pages that serve every subsequent query
Queries get faster/cheaper as wiki matures (synthesis happens at ingest, not query time)
Human-readable, git-versionable, no opaque databases to debug
Cross-referential density grows with scale — knowledge graph behavior in plain text
Scales from personal to team/enterprise use
95% token reduction compared to re-reading raw documents on every query

Disadvantages¶

Ingest is slow/expensive for large corpora (~10 min for a single long article; ~14 min for 36 video transcripts)
Synthesis quality is LLM-dependent; errors persist until caught by lint
Schema design is non-trivial
Flat-file indexing effective only up to ~hundreds of pages (millions need traditional RAG)
No real-time knowledge support