Query Pipeline¶
Summary¶
The query operation runs RAG over compiled wiki pages rather than raw document chunks. The LLM retrieves pre-synthesized, cross-referenced encyclopedia entries instead of random fragments, producing higher-quality answers.
Four-Step Pipeline¶
- Embed the question — using the same embedding model as the index
- Cosine similarity search — over the embedding index to find top-k most relevant pages
- Assemble context — load full bodies of retrieved pages
- Stream the answer — using the LLM with assembled wiki context as the knowledge source
Why It's Better Than Traditional RAG¶
The LLM isn't reading a random chunk from page 14 of a PDF. It's reading a pre-synthesized, cross-referenced encyclopedia entry that already integrates everything the system has ever learned about that concept from every source ever ingested.
The --save Flag (Compounding Loop)¶
When enabled, a synthesized answer that represents new, valuable knowledge is automatically filed back as a new wiki page. The slug is derived from the question itself. Future sessions benefit immediately.
This completes the compounding principle: you asked a question, the system answered it, and now the wiki knows the answer too.
Caveat: not every answer is worth preserving. The --save decision requires human judgment — automating this requires quality filtering logic the pattern doesn't yet address.
Query Templates¶
Beyond simple questions, the pattern supports named query templates that extract specific types of insight:
| Category | Purpose | Example |
|---|---|---|
| Synthesis | Integrated understanding | "Give me the single most important insight" |
| Gap-finding | Identify weaknesses | "What important topics are missing?" |
| Debate | Surface tensions | "What is the biggest disagreement between my sources?" |
| Output | Produce artifacts | Study guides, cheat sheets, slide decks |
| Health | Audit the wiki | Duplication checks, consistency audits |
| Personal application | Connect to situations | "What mistakes am I currently making?" |
These templates reframe the wiki from a Q&A system into a thinking partner — you're commissioning analysis across your entire compiled knowledge base.
Scale Considerations¶
- Flat-file embedding index with NumPy cosine similarity is effective up to ~500 pages
- Beyond that, query latency degrades; approximate nearest-neighbor indexing (FAISS, ChromaDB, Weaviate) becomes necessary
- Hybrid BM25 + vector retrieval requires additional infrastructure