Skip to content

RAG vs LLM Wiki

Summary

A comparison between Retrieval-Augmented Generation (RAG) and the LLM Wiki pattern, highlighting the fundamental architectural difference: stateless retrieval versus stateful, compounding knowledge.

Core Difference

RAG retrieves from raw documents at query time. The LLM finds relevant chunks, synthesizes an answer, and forgets. Nothing is learned from one query to the next. Every query starts from zero.

LLM Wiki pre-compiles knowledge into structured, interlinked pages at ingest time. The LLM reads sources once, synthesizes them into wiki pages, and queries run against this compiled artifact. Knowledge accumulates and cross-references grow denser over time.

Comparison Table

Dimension RAG (Semantic Search) LLM Wiki
Discovery Similarity search over vectors Reads indexes, follows links
Understanding Chunk similarity Deep relationships via links
Knowledge persistence None — stateless Full — builds over time
Synthesis timing Per query, from scratch Pre-compiled at ingest
Multi-document answers Retrieved chunks pieced together at query time Pre-synthesized encyclopedia entries
Contradiction detection No Yes — flagged during compilation
Source traceability High (chunk-level) Moderate (page-level)
Infrastructure Embedding model, vector DB, chunking pipeline Just markdown files
Cost Ongoing compute and storage Basically free (tokens only)
Maintenance Re-embed when things change Lint, clean up, add articles
Setup complexity Low Low–Medium
Query speed Consistent (retrieval cost each time) Improves over time (pre-organized material)
Ingest cost Low (chunk and embed) High (routing + synthesis per page)
Long-term quality Stays the same Improves with each source
Scale limit Millions of documents Hundreds of pages (with good indexes)
Best for Quick Q&A on documents, rapidly changing data, enterprise scale Deep, growing research topics over weeks/months, personal scale

One user reported turning 383 scattered files and 100+ meeting transcripts into a wiki, dropping token usage by 95% when querying with Claude.

When to Use Each

Use RAG when: - Data changes daily or frequently - Exact source traceability matters for every claim - You need quick answers without schema design - Bulk document ingestion is the priority

Use LLM Wiki when: - Building expertise on a topic over weeks or months - You want the model to reason across your knowledge base - You value synthesis and connection-making over retrieval - You want knowledge to compound, not evaporate between sessions

The Tradeoff

RAG sidesteps maintenance overhead by doing all synthesis at query time. It's cheaper to build but never gets smarter. The same query on day one and day one thousand produces the same quality answer.

LLM Wiki inverts this: ingest is expensive, schema design takes thought, maintenance requires periodic linting. But a well-maintained wiki becomes qualitatively different over time — dense with cross-references, drawing on synthesized knowledge from dozens of sources.

See Also