Query Pipeline¶

Summary¶

The query operation runs RAG over compiled wiki pages rather than raw document chunks. The LLM retrieves pre-synthesized, cross-referenced encyclopedia entries instead of random fragments, producing higher-quality answers.

Four-Step Pipeline¶

Embed the question — using the same embedding model as the index
Cosine similarity search — over the embedding index to find top-k most relevant pages
Assemble context — load full bodies of retrieved pages
Stream the answer — using the LLM with assembled wiki context as the knowledge source

Why It's Better Than Traditional RAG¶

The LLM isn't reading a random chunk from page 14 of a PDF. It's reading a pre-synthesized, cross-referenced encyclopedia entry that already integrates everything the system has ever learned about that concept from every source ever ingested.

The `--save` Flag (Compounding Loop)¶

When enabled, a synthesized answer that represents new, valuable knowledge is automatically filed back as a new wiki page. The slug is derived from the question itself. Future sessions benefit immediately.

This completes the compounding principle: you asked a question, the system answered it, and now the wiki knows the answer too.

Caveat: not every answer is worth preserving. The --save decision requires human judgment — automating this requires quality filtering logic the pattern doesn't yet address.

Query Templates¶

Beyond simple questions, the pattern supports named query templates that extract specific types of insight:

Category	Purpose	Example
Synthesis	Integrated understanding	"Give me the single most important insight"
Gap-finding	Identify weaknesses	"What important topics are missing?"
Debate	Surface tensions	"What is the biggest disagreement between my sources?"
Output	Produce artifacts	Study guides, cheat sheets, slide decks
Health	Audit the wiki	Duplication checks, consistency audits
Personal application	Connect to situations	"What mistakes am I currently making?"

These templates reframe the wiki from a Q&A system into a thinking partner — you're commissioning analysis across your entire compiled knowledge base.

Scale Considerations¶

Flat-file embedding index with NumPy cosine similarity is effective up to ~500 pages
Beyond that, query latency degrades; approximate nearest-neighbor indexing (FAISS, ChromaDB, Weaviate) becomes necessary
Hybrid BM25 + vector retrieval requires additional infrastructure