LLM Wiki Pattern¶
Summary¶
A pattern for building personal knowledge bases where an LLM incrementally builds and maintains a persistent, interlinked wiki of markdown files from raw source documents, rather than retrieving from raw documents at query time like RAG.
Core Idea¶
Instead of stateless retrieval (RAG), the LLM Wiki pattern uses stateful, compounding knowledge. When a new source is added, the LLM doesn't just index it — it reads it, extracts key information, and integrates it into the existing wiki by:
- Updating existing pages with new information
- Creating new entity pages for first-time concepts
- Adding
[[wiki-links]]connecting concepts - Flagging contradictions between new and existing knowledge
The compiled artifact is a set of human-readable, LLM-maintained markdown files — one per concept, with cross-references, provenance tracking, and version history via git.
Key Difference from RAG¶
| RAG | LLM Wiki | |
|---|---|---|
| Knowledge persistence | None — stateless | Full — builds over time |
| Synthesis timing | Per query, from scratch | Pre-compiled at ingest time |
| Contradiction detection | No | Yes — flagged during compilation |
| Multi-document answers | Retrieved chunks pieced together | Pre-synthesized encyclopedia entries |
The analogy is software compilation: RAG executes source code on every request; LLM Wiki compiles knowledge once into an optimized artifact that benefits every subsequent query.
Origins¶
Proposed by Andrej Karpathy in a GitHub Gist in April 2026. The post went viral in the developer community within days. Karpathy's own wiki reached ~100 articles and ~500,000 words while remaining navigable by the LLM using index and summaries. He noted: "I thought I had to reach for fancy RAG, but the LLM has been pretty good about automaintaining index files."
Architecture¶
Three cleanly separated layers:
- Raw Sources (Immutable) — Human-curated source documents; LLM reads but never modifies; serves as audit trail
- The Wiki (LLM-Maintained) — Interlinked markdown pages with YAML frontmatter;
index.mdcatalog;log.mdactivity log - The Schema (Governance) — Human-defined page universe (slugs, titles, descriptions); the contract between human intent and LLM execution
Core Operations¶
- Ingest Pipeline — Read source, route to relevant pages, synthesize updates, embed, update index/log
- Query Pipeline — RAG over compiled wiki pages instead of raw chunks
- Lint Operation — Health checks: orphans, broken links, contradictions, stale content
Extensions and Adaptations¶
Internal Data (Codebase Memory)¶
Cole Medin adapted the pattern from external data (articles, papers) to internal data — giving Claude Code a memory that evolves with a codebase. Instead of ingesting web content, the system captures conversation logs via Claude Code Hooks (session start, pre-compact, session end) and extracts structured knowledge articles from them.
Hot Cache¶
Nate Herk and Cole Medin introduced a hot.md file — a ~500-character cache of the most recent conversation context. Useful for agents that need quick context without crawling full wiki pages (e.g., executive assistants). Hot Cache
Flat vs. Structured Wiki¶
Karpathy noted: "Sometimes I like to keep it really simple and really flat" — no subfolders, no over-organizing. Some implementations (like Cole's YouTube wiki) use subfolders (analysis, concepts, entities, sources) which makes more sense for certain use cases.
Token Efficiency¶
One user reported turning 383 scattered files and 100+ meeting transcripts into a compact wiki, dropping token usage by 95% when querying with Claude. The wiki eliminates the need to re-read raw documents on every query.
Advantages¶
- Knowledge compounds: each source enriches the same pages that serve every subsequent query
- Queries get faster/cheaper as wiki matures (synthesis happens at ingest, not query time)
- Human-readable, git-versionable, no opaque databases to debug
- Cross-referential density grows with scale — knowledge graph behavior in plain text
- Scales from personal to team/enterprise use
- 95% token reduction compared to re-reading raw documents on every query
Disadvantages¶
- Ingest is slow/expensive for large corpora (~10 min for a single long article; ~14 min for 36 video transcripts)
- Synthesis quality is LLM-dependent; errors persist until caught by lint
- Schema design is non-trivial
- Flat-file indexing effective only up to ~hundreds of pages (millions need traditional RAG)
- No real-time knowledge support