LLM Wiki by Andrej Karpathy: Build a Compounding Knowledge Base (Tutorial)¶
Summary¶
A step-by-step tutorial from Data Science Dojo on building an LLM Wiki using Karpathy's pattern, using five foundational AI research papers as starting material.
Key Takeaways¶
- An LLM wiki is a structured, AI-maintained knowledge base that grows smarter with every source added, unlike RAG which rediscovers knowledge from scratch on every query
- Karpathy introduced the pattern in a GitHub Gist in April 2026, which went viral among developers
- The tutorial uses five papers: Attention Is All You Need (2017), BERT (2018), GPT-3 (2020), Foundation Models (2021), and RLHF (2022)
- Core workflow: drop sources in
raw/, run compilation prompt, LLM creates entity pages with wiki-links, contradictions flagged - At 100+ pages, the wiki can answer questions where the answer doesn't exist in any single source — the answer lives in the relationships between pages
- Recommends Obsidian for graph view visualization, Obsidian Web Clipper for article ingestion
- Karpathy's own wiki reached ~100 articles and 400,000 words while remaining navigable by the LLM
LLM Wiki vs RAG Comparison¶
| RAG | LLM Wiki | |
|---|---|---|
| Knowledge persistence | None — stateless | Full — builds over time |
| Multi-document synthesis | Per query, from scratch | Pre-compiled into pages |
| Contradiction detection | No | Yes — flagged during compilation |
| Source traceability | High | Moderate (page-level) |
| Best for | Quick Q&A on documents | Deep, growing research topics |
Common Mistakes Warned Against¶
- Putting too much in one page (each entity page should cover exactly one concept)
- Never running linting (errors propagate fast)
- Adding too many unrelated topics at once (wiki compounds best when sources are topically related)