RAG vs LLM Wiki¶
Summary¶
A comparison between Retrieval-Augmented Generation (RAG) and the LLM Wiki pattern, highlighting the fundamental architectural difference: stateless retrieval versus stateful, compounding knowledge.
Core Difference¶
RAG retrieves from raw documents at query time. The LLM finds relevant chunks, synthesizes an answer, and forgets. Nothing is learned from one query to the next. Every query starts from zero.
LLM Wiki pre-compiles knowledge into structured, interlinked pages at ingest time. The LLM reads sources once, synthesizes them into wiki pages, and queries run against this compiled artifact. Knowledge accumulates and cross-references grow denser over time.
Comparison Table¶
| Dimension | RAG (Semantic Search) | LLM Wiki |
|---|---|---|
| Discovery | Similarity search over vectors | Reads indexes, follows links |
| Understanding | Chunk similarity | Deep relationships via links |
| Knowledge persistence | None — stateless | Full — builds over time |
| Synthesis timing | Per query, from scratch | Pre-compiled at ingest |
| Multi-document answers | Retrieved chunks pieced together at query time | Pre-synthesized encyclopedia entries |
| Contradiction detection | No | Yes — flagged during compilation |
| Source traceability | High (chunk-level) | Moderate (page-level) |
| Infrastructure | Embedding model, vector DB, chunking pipeline | Just markdown files |
| Cost | Ongoing compute and storage | Basically free (tokens only) |
| Maintenance | Re-embed when things change | Lint, clean up, add articles |
| Setup complexity | Low | Low–Medium |
| Query speed | Consistent (retrieval cost each time) | Improves over time (pre-organized material) |
| Ingest cost | Low (chunk and embed) | High (routing + synthesis per page) |
| Long-term quality | Stays the same | Improves with each source |
| Scale limit | Millions of documents | Hundreds of pages (with good indexes) |
| Best for | Quick Q&A on documents, rapidly changing data, enterprise scale | Deep, growing research topics over weeks/months, personal scale |
One user reported turning 383 scattered files and 100+ meeting transcripts into a wiki, dropping token usage by 95% when querying with Claude.
When to Use Each¶
Use RAG when: - Data changes daily or frequently - Exact source traceability matters for every claim - You need quick answers without schema design - Bulk document ingestion is the priority
Use LLM Wiki when: - Building expertise on a topic over weeks or months - You want the model to reason across your knowledge base - You value synthesis and connection-making over retrieval - You want knowledge to compound, not evaporate between sessions
The Tradeoff¶
RAG sidesteps maintenance overhead by doing all synthesis at query time. It's cheaper to build but never gets smarter. The same query on day one and day one thousand produces the same quality answer.
LLM Wiki inverts this: ingest is expensive, schema design takes thought, maintenance requires periodic linting. But a well-maintained wiki becomes qualitatively different over time — dense with cross-references, drawing on synthesized knowledge from dozens of sources.