Schema / PageSpec¶
Summary¶
The schema is a JSON file (or encoded in AGENTS.md) that defines the "page universe" of the wiki — which concepts the wiki tracks, their slugs, titles, and one-line descriptions. It is the contract between human intent and LLM execution.
Purpose¶
The schema serves as the governance layer in the three-layer architecture:
- Humans define what knowledge should exist (via schema)
- The LLM handles how that knowledge is organized and kept current (via wiki pages)
PageSpec Structure¶
Each entry in the schema defines one tracked concept:
json
{
"slug": "attention-mechanism",
"title": "Attention Mechanism",
"description": "The scaled dot-product attention operation central to transformer architectures"
}
| Field | Purpose |
|---|---|
slug |
Unique identifier, maps to filename (wiki/attention-mechanism.md) |
title |
Human-readable display name |
description |
One-line summary used by the routing step to determine relevance |
How It's Used¶
During Ingest (Routing)¶
The LLM reads a compact summary of the schema (one line per page: slug: title — description) alongside the source text, and returns the slugs that are genuinely relevant. The description field is critical — a vague description leads to over-routing (everything seems relevant) or under-routing (nothing does).
During Lint¶
The lint operation compares schema pages against actual wiki files to find: - Orphaned pages: files not defined in schema - Missing pages: schema slugs with no corresponding file
Managing the Schema¶
- Add a concept → add a PageSpec; next ingest will automatically create and populate the page
- Remove a concept → delete from schema; page becomes an orphan (can be cleaned up by lint)
- Refine a concept → update the description to improve routing accuracy
Design Considerations¶
- Schema design is non-trivial work, especially in domains with complex, overlapping concepts
- Getting the schema right takes iteration
- Poorly designed schema leads to sparse or bloated pages
- Alternative: for simpler setups, the schema can be encoded implicitly in
AGENTS.mdrather than as a separate JSON file