Skip to content

Schema / PageSpec

Summary

The schema is a JSON file (or encoded in AGENTS.md) that defines the "page universe" of the wiki — which concepts the wiki tracks, their slugs, titles, and one-line descriptions. It is the contract between human intent and LLM execution.

Purpose

The schema serves as the governance layer in the three-layer architecture:

  • Humans define what knowledge should exist (via schema)
  • The LLM handles how that knowledge is organized and kept current (via wiki pages)

PageSpec Structure

Each entry in the schema defines one tracked concept:

json { "slug": "attention-mechanism", "title": "Attention Mechanism", "description": "The scaled dot-product attention operation central to transformer architectures" }

Field Purpose
slug Unique identifier, maps to filename (wiki/attention-mechanism.md)
title Human-readable display name
description One-line summary used by the routing step to determine relevance

How It's Used

During Ingest (Routing)

The LLM reads a compact summary of the schema (one line per page: slug: title — description) alongside the source text, and returns the slugs that are genuinely relevant. The description field is critical — a vague description leads to over-routing (everything seems relevant) or under-routing (nothing does).

During Lint

The lint operation compares schema pages against actual wiki files to find: - Orphaned pages: files not defined in schema - Missing pages: schema slugs with no corresponding file

Managing the Schema

  • Add a concept → add a PageSpec; next ingest will automatically create and populate the page
  • Remove a concept → delete from schema; page becomes an orphan (can be cleaned up by lint)
  • Refine a concept → update the description to improve routing accuracy

Design Considerations

  • Schema design is non-trivial work, especially in domains with complex, overlapping concepts
  • Getting the schema right takes iteration
  • Poorly designed schema leads to sparse or bloated pages
  • Alternative: for simpler setups, the schema can be encoded implicitly in AGENTS.md rather than as a separate JSON file

See Also