Skip to content

Query Pipeline

Summary

The query operation runs RAG over compiled wiki pages rather than raw document chunks. The LLM retrieves pre-synthesized, cross-referenced encyclopedia entries instead of random fragments, producing higher-quality answers.

Four-Step Pipeline

  1. Embed the question — using the same embedding model as the index
  2. Cosine similarity search — over the embedding index to find top-k most relevant pages
  3. Assemble context — load full bodies of retrieved pages
  4. Stream the answer — using the LLM with assembled wiki context as the knowledge source

Why It's Better Than Traditional RAG

The LLM isn't reading a random chunk from page 14 of a PDF. It's reading a pre-synthesized, cross-referenced encyclopedia entry that already integrates everything the system has ever learned about that concept from every source ever ingested.

The --save Flag (Compounding Loop)

When enabled, a synthesized answer that represents new, valuable knowledge is automatically filed back as a new wiki page. The slug is derived from the question itself. Future sessions benefit immediately.

This completes the compounding principle: you asked a question, the system answered it, and now the wiki knows the answer too.

Caveat: not every answer is worth preserving. The --save decision requires human judgment — automating this requires quality filtering logic the pattern doesn't yet address.

Query Templates

Beyond simple questions, the pattern supports named query templates that extract specific types of insight:

Category Purpose Example
Synthesis Integrated understanding "Give me the single most important insight"
Gap-finding Identify weaknesses "What important topics are missing?"
Debate Surface tensions "What is the biggest disagreement between my sources?"
Output Produce artifacts Study guides, cheat sheets, slide decks
Health Audit the wiki Duplication checks, consistency audits
Personal application Connect to situations "What mistakes am I currently making?"

These templates reframe the wiki from a Q&A system into a thinking partner — you're commissioning analysis across your entire compiled knowledge base.

Scale Considerations

  • Flat-file embedding index with NumPy cosine similarity is effective up to ~500 pages
  • Beyond that, query latency degrades; approximate nearest-neighbor indexing (FAISS, ChromaDB, Weaviate) becomes necessary
  • Hybrid BM25 + vector retrieval requires additional infrastructure

See Also