Paul Chen

Posted on May 11

Synthadoc: Routing at Scale, Quality Gates, and the Knowledge Backend Pattern

#ai #architecture #rag #softwareengineering

When we shipped v0.1.0, Synthadoc did one thing well: it turned raw sources into a structured wiki that got smarter with every ingest. v0.2.0 made that wiki searchable with hybrid BM25 + vector retrieval. v0.3.0 opened up the source types - YouTube transcripts, web search fan-out, CLI provider integration so your existing Claude Code or Opencode subscription could power the whole thing.

But a pattern kept surfacing in user feedback. Once a wiki crossed a few hundred pages, three problems appeared in a cluster: queries got slower, low-confidence pages polluted search results, and there was no clean way to pipe the wiki's structured knowledge into an agent prompt without getting back a synthesised answer when you wanted the raw evidence.

v0.4.0 addresses all three. This post walks through the design decisions behind each feature, the benchmark numbers, and why we think the third one, context packs, points at something larger than a feature.

The Scale Problem

BM25 is fast. On a 100-page wiki, a query completes in single-digit milliseconds. The problem is that BM25 is also undiscriminating: it scores every page against every query, regardless of how obviously irrelevant most of those pages are.

That's fine at 100 pages. At 1,000 pages with diverse topics - a personal research wiki covering ML, distributed systems, organisational theory, and management literature - you're running a full-corpus scan for every query. And because query decomposition splits one question into 3–5 sub-questions, you multiply that cost by 3–5 on every call.

We benchmarked the unrouted baseline across corpus sizes. The numbers aren't alarming until they are:

Corpus size	Full-corpus P95 latency	Routed P95 latency (2 of 10 branches)
100 pages	7 ms	7 ms
500 pages	38 ms	12 ms
1,000 pages	74 ms	18 ms
5,000 pages	112 ms	21 ms
10,000 pages	191 ms	24 ms

Routed search stays near-flat as the corpus grows, because the per-branch page count doesn't change even as the total wiki does. Full-corpus search grows with the wiki - not catastrophically, but noticeably, and that growth compounds across decomposed sub-queries.

The other problem at scale is more subtle: a query about treatment protocols for hypertension shouldn't touch the pages about distributed consensus algorithms. Not just for performance reasons - also because irrelevant pages can drift into synthesis and dilute the answer.

Feature 1: Routing Layer

Why It Matters for a Self-Growing Wiki

The scale problem above gets worse in a specific way that's easy to underestimate: a well-configured Synthadoc wiki doesn't stay at its initial size. With nightly scheduled ingests pulling from web searches, PDFs, YouTube transcripts, and curated source lists, a wiki can double from 200 to 400 pages within a few weeks, then reach 1,000 within a few months without the user doing anything manually. That's exactly the point.

But without routing, query quality degrades silently as that growth happens. Every new page increases the BM25 corpus, and the search engine has no way to know that "What are the treatment protocols for hypertension?" should not touch the 300 pages about software architecture. More pages means more irrelevant candidates competing to drift into synthesis, more false positives, and more latency - and none of it is visible until queries start returning noticeably diluted answers.

Routing is what makes autonomous growth sustainable. It's the mechanism that keeps query scope bounded to what's actually relevant, regardless of how large the total wiki becomes. The branch taxonomy is defined once; IngestAgent maintains it automatically from that point forward - every new page created by ingest is auto-placed into the most relevant branch, so ROUTING.md stays accurate as the wiki grows without manual intervention.

The routing layer introduces a file called ROUTING.md at the wiki root. Its format is intentionally simple - the same ## H2 → [[slug]] structure already used in index.md:

## People
- [[alan-turing]]
- [[grace-hopper]]
- [[ada-lovelace]]

## Hardware
- [[von-neumann-architecture]]
- [[eniac-computer]]
- [[transistor-and-microchip]]

## Networks
- [[internet-origins]]
- [[arpanet-history]]

User-owned. Scaffold creates it once from the current index structure and never rewrites it. The user defines the branch taxonomy; the system maintains it.

How Routing Works at Query Time

At query time, QueryAgent reads ROUTING.md (cached per session, invalidated on write), passes the branch headings and the user's query to the LLM, and receives back a short JSON array of the 1–2 most relevant branch names. BM25 then runs only over the slugs listed under those branches.

If ROUTING.md is absent, or if no branch scores above threshold, the system falls back to full-corpus search transparently - no error, no degraded output.

IngestAgent also uses routing: when a new page is created, it's slotted into the most relevant branch automatically. ROUTING.md stays consistent without manual maintenance.

Aliases

The other half of the routing feature is alias resolution. Anyone who maintains a personal knowledge base has personal terminology that diverges from canonical names. You might always call a concept by a shorthand, an acronym, or a translation - and BM25 will miss the connection because the strings don't match.

Aliases live in each page's YAML frontmatter:

---
title: Alan Turing
aliases: [turing, the turing paper, incomputability guy]
---

At query time, before BM25 runs, QueryAgent expands any alias matches in the query to their canonical slug. The user's internal vocabulary resolves to the wiki's vocabulary without re-learning what the LLM decided to call things.

ScaffoldAgent suggests initial aliases when generating a page. Users refine them in Obsidian's Properties panel.

Protected Scaffold Zone

One problem that appeared as wikis grew: users would hand-edit index.md to add an introduction, a personal note, a custom link - and the next scaffold run would erase it. Scaffold owned the whole file.

v0.4.0 introduces a marker line:

# My Research Wiki

My custom introduction and notes here.
Scaffold never touches anything above this line.

<!-- synthadoc:scaffold -->

## People
- [[alan-turing]] — Theoretical foundations of computation

Scaffold only regenerates content below . Everything above is preserved verbatim across every scaffold run. The marker is inserted automatically on the first scaffold run if absent.

Routing CLI

synthadoc routing init           # generate ROUTING.md from current index (one-time)
synthadoc routing validate       # report dangling slugs — dry run, no changes
synthadoc routing clean          # auto-remove dangling entries

routing validate is worth running after bulk ingests or manual page deletions:

Dangling slugs in ROUTING.md (3):
  [Hardware]  [[eniac-computer]]
  [People]    [[konrad-zuse]]
  [Networks]  [[arpanet-history]]

Scheduling for a Self-Growing Wiki

There are two distinct scheduling patterns worth setting up: nightly growth and weekly housekeeping.

Nightly growth — ingest pulls in new sources automatically:

synthadoc schedule add --op "ingest --batch raw_sources/" --cron "0 2 * * *" -w my-wiki

As each ingest job completes, IngestAgent writes the new page to wiki/ and appends its slug to ROUTING.md under the most relevant branch. No manual step needed — the routing index grows alongside the wiki.

Weekly housekeeping - three operations, run in sequence:

synthadoc schedule add --op "lint run"      --cron "0 3 * * 0" -w my-wiki   # Sunday 3 AM
synthadoc schedule add --op "scaffold"      --cron "0 4 * * 0" -w my-wiki   # Sunday 4 AM
synthadoc schedule add --op "routing clean" --cron "0 5 * * 0" -w my-wiki   # Sunday 5 AM

The order matters. Lint runs first and removes dead wikilinks left behind by deleted pages. Scaffold runs next and regenerates index.md to reflect the current page set - new categories get added, empty ones get removed. Routing clean runs last and prunes any dangling slug entries from ROUTING.md that no longer have a corresponding wiki page. After all three, the index and routing table are consistent with the actual state of the wiki.

One thing to be clear about: routing init is a one-time setup command, not something to schedule. Running it again would overwrite ROUTING.md and erase any branch customisations you've made since the initial setup. routing clean is the recurring maintenance command - it only removes entries for missing pages and never touches branch structure.

If you prefer to declare the schedule in config rather than via the CLI, add a [schedule] block to .synthadoc/config.toml and register everything in one step:

[schedule]
jobs = [
  { op = "ingest --batch raw_sources/", cron = "0 2 * * *" },
  { op = "lint run",                    cron = "0 3 * * 0" },
  { op = "scaffold",                    cron = "0 4 * * 0" },
  { op = "routing clean",               cron = "0 5 * * 0" },
]

synthadoc schedule apply -w my-wiki

With nightly ingests and a weekly maintenance trio in place, the wiki grows, stays accurate, and self-corrects - without requiring manual intervention.

Feature 2: Candidates Staging

The second feature addresses a different scale problem: quality at the write path.

Before v0.4.0, IngestAgent wrote new pages directly to wiki/ with no review step. A high-confidence page about a well-structured source landed right next to a speculative page inferred from a thin web article. Both entered BM25, both appeared in orphan detection and contradiction checks, and both showed up in synthesis - with no signal to distinguish them.

The Staging Concept

Candidates staging introduces a fork in the write path. Pages go to wiki/candidates/ instead of wiki/ when they don't meet a configurable confidence threshold. They're excluded from BM25, orphan detection, and contradiction checks until explicitly promoted.

The policy is configured in .synthadoc/config.toml:

[ingest]
staging_policy = "threshold"      # "off" | "all" | "threshold"
staging_confidence_min = "high"   # "high" | "medium" | "low"

Three policies:

Policy	Behaviour
`"off"`	All new pages go directly to`wiki/` - current behaviour, default
`"threshold"`	Pages meeting`staging_confidence_min` auto-promote; lower confidence → `wiki/candidates/`
`"all"`	Every new page requires explicit promotion, regardless of confidence

The config is hot-reloaded - a policy change takes effect on the next ingest job with no server restart.

The Staging Workflow

Reviewing Candidates

synthadoc candidates list

Candidates (3):
  eniac-computer    confidence: low     ingested: 2026-05-05
  konrad-zuse       confidence: medium  ingested: 2026-05-05
  arpanet-history   confidence: high    ingested: 2026-05-04

synthadoc candidates promote arpanet-history
synthadoc candidates discard eniac-computer
synthadoc candidates promote --all

Promotion does four things atomically: moves the file from wiki/candidates/ to wiki/, appends the slug to index.md under the best-matching category, appends it to ROUTING.md under the best-matching branch, and records the promotion in the audit trail. Discard deletes the file and records the reason.

Scheduling and Staging

Unlike routing and lint, candidates staging doesn't have a useful scheduled action. The two available commands — candidates promote --all and candidates discard --all - are too blunt to schedule safely. Scheduling promote --all on a timer would auto-approve exactly the pages the quality gate held back. Scheduling discard --all would silently delete pages you haven't reviewed yet. Neither command has a --older-than or --confidence filter that would make scheduled execution sensible.

The right integration is manual: run candidates list after a large batch ingest, or once a week if the wiki is growing quickly. It takes seconds. The review step is the intentional human checkpoint in an otherwise automated pipeline, it's where you decide what enters the wiki's searchable knowledge base.

If candidates are accumulating faster than you can review them, the right response is to adjust the policy rather than schedule a cleanup:

# Widen the auto-promote threshold - fewer pages need review
synthadoc staging policy threshold --min-confidence medium

# Or turn staging off entirely - if you trust all your sources
synthadoc staging policy off

The staging policy is the dial; the CLI review commands are for the remainder that needs a human decision.

Why This Matters

The practical effect of staging_policy = "threshold" on a high-volume wiki is significant. After ingesting 30 web articles from varied sources, typical results look like: 22 high-confidence pages auto-promote and enter the main wiki immediately; 8 lower-confidence pages wait in candidates. Those 8 include speculative pages, ambiguous slug assignments, and pages where the source was thin enough that the LLM wasn't confident what it was describing.

In the full-corpus approach, those 8 pages would be silently diluting every query that touched their topic. In the staged approach, they're visible and actionable.

Feature 3: Context Packs

The third feature came from a different direction. Users weren't asking "how do I get a better answer?" They were asking "how do I get the raw evidence so I can do something with it myself?"

QueryAgent synthesises. That's its job. But synthesis isn't always what you want. Sometimes you want the actual page excerpts - cited, bounded, ranked by relevance - to paste into an agent prompt, to attach to a report, to review before a meeting, or to feed into an automated pipeline.

Context packs are the answer.

How Context Packs Work

The ContextAgent reuses QueryAgent.decompose() and HybridSearch. It doesn't synthesise - it packs. Each page excerpt is included verbatim (up to its per-page limit), with attribution: slug, relevance score, confidence level, tags, source path.

Output Format

# Context Pack: history of early computing pioneers
Generated: 2026-05-09T14:22:01
Token budget: 4000 | Used: 3847 | Omitted: 2 pages (budget exceeded)

---

## [[alan-turing]] - relevance: 0.92
> Alan Turing developed the theoretical foundations of computation with his 1936
> paper "On Computable Numbers." The paper introduced the abstract Turing machine
> and proved the existence of undecidable problems...
Source: `wiki/alan-turing.md` | Confidence: high | Tags: people, mathematics

## [[grace-hopper]] - relevance: 0.87
> Grace Hopper pioneered compiler development and made programming accessible to
> humans. She developed the first compiler (A-0) in 1952 and later led the team
> that created COBOL...
Source: `wiki/grace-hopper.md` | Confidence: high | Tags: people, software

---

## Omitted — token budget exceeded
- [[von-neumann-architecture]] — ~820 tokens
- [[eniac-computer]] — ~650 tokens

CLI Usage

synthadoc context build "microservices patterns"
synthadoc context build "microservices patterns" --tokens 8000
synthadoc context build "microservices patterns" --output context.md

Token Budget Control

This is where context packs differ from simply querying the wiki. An external agent doesn't want an unbounded blob of text - it has its own context window to manage, its own prompt already taking up tokens, and its own cost constraints. The token_budget parameter gives the caller precise control over how much of the wiki's knowledge gets included.

Synthadoc packs pages greedily by relevance score until the budget is exhausted, then lists everything that didn't fit with estimated token counts. The calling agent knows exactly what it got and what it didn't - and can decide whether to request a larger budget, run a second more focused query, or proceed with what's there. There are no surprises about how much context the call will consume.

This predictability is what makes context packs suitable for production agentic pipelines. An agent orchestrator can reserve a fixed token slice for domain knowledge, call context/build with that exact budget, and get back a response that fits - every time, regardless of how large the underlying wiki has grown.

The REST API - Synthadoc as a Knowledge Backend

Context packs expose a POST /context/build endpoint that returns the same data as structured JSON. This is where the pattern becomes interesting.

An agent that needs grounding context before reasoning can call Synthadoc's REST API directly, get back a bounded, cited, ranked set of page excerpts, and inject them into its own prompt - without going through a synthesis step. Synthadoc becomes the knowledge layer, the agent provides the reasoning.

POST /context/build
{ "goal": "microservices patterns for high-throughput event processing", "token_budget": 4000 }

200 OK
{
  "goal": "...",
  "token_budget": 4000,
  "tokens_used": 3847,
  "pages": [
    {
      "slug": "event-driven-architecture",
      "relevance": 0.94,
      "excerpt": "Event-driven architecture decouples producers from consumers...",
      "source": "wiki/event-driven-architecture.md",
      "confidence": "high",
      "tags": ["architecture", "distributed-systems"]
    }
  ],
  "omitted": [
    { "slug": "kafka-internals", "estimated_tokens": 980 }
  ]
}

The existing MCP server exposes this as a native tool call. An agent running in any MCP-compatible host can call context/build before it reasons, get structured evidence back, and proceed with grounding it wouldn't otherwise have.

This is what we mean by the "knowledge backend pattern": Synthadoc manages accumulation, deduplication, contradiction detection, and retrieval. The calling agent manages reasoning and action. The division of labour is clean, the token envelope is caller-controlled, and the knowledge layer is persistent across agent sessions.

Other v0.4.0 Changes

Plugin Install CLI

synthadoc plugin install history-of-computing

In earlier versions, installing the Obsidian plugin required locating the plugins directory manually, copying files, and restarting Obsidian. v0.4.0 adds a single CLI command that installs the plugin directly into the active Obsidian vault. That's the entire CLI step - the rest is done in Obsidian.

AI Research Demo: Contradiction Detection End-to-End

The demos/ai-research/ demo now includes a PDF source (llm-benchmarks-q1-2026.pdf) that explicitly disputes a claim in the existing wiki. Running the demo shows the full contradiction detection and flagging lifecycle: source ingested, existing page status updated to contradicted, audit event recorded. The demo now covers all five IngestAgent decision paths - create, update, skip, flag, and contradiction.

Decision Cache Prompt-Awareness

A quiet but consequential fix: the LLM decision cache key previously included only the content hash and existing slugs. This meant that changes to purpose.md - the file that scopes what belongs in the wiki - were invisible to the cache. An ingest run after a purpose change would serve stale decisions for any source whose content hadn't changed. v0.4.0 includes the decision prompt itself in the cache key, so any change to purpose.md automatically busts the cache for all affected sources.

v0.1 to v0.4: What Changed

It's worth stepping back to see what the four versions have built:

Version	Core addition
v0.1.0	Ingest-time synthesis: sources become a structured wiki, not raw chunks
v0.2.0	Hybrid BM25 + vector search; query decomposition; knowledge gap detection; full audit trail
v0.3.0	YouTube and web search ingestion; CLI provider integration (Claude Code & Opencode support); CJK support; contradiction detection improvements
v0.4.0	Routing at scale; candidates staging for quality control; context packs as a knowledge backend API

The first two versions established the ingest-then-query model and made it trustworthy - every action audited, every contradiction surfaced. The third version expanded what you could ingest and who could afford to run it. The fourth version addresses what happens when the wiki grows to a size where the original flat model starts to show cracks.

The routing benchmarks are honest about where we are: 191ms for a full-corpus query across 10,000 pages is fine for interactive use, but it compounds across sub-questions and becomes a real cost in high-throughput agentic pipelines. The routed 24ms figure at the same corpus size is where we want the system to be. The benchmark-gated release process we introduced in v0.4.0 is the mechanism that ensures it stays there as the codebase evolves.

Try Synthadoc v0.4.0

Synthadoc v0.4.0 is available now on GitHub under the AGPL-3.0 licence. The quickest path:

git clone https://github.com/axoviq-ai/synthadoc.git
cd synthadoc
pip3 install -e ".[dev]"
synthadoc install history-of-computing --target ~/wikis --demo
synthadoc plugin install history-of-computing
synthadoc use history-of-computing
synthadoc serve

Then open Obsidian, open ~/wikis/history-of-computing as a vault, install the Dataview community plugin, and enable the Synthadoc plugin. The demo wiki runs against Gemini Flash 2.0, which is free-tier eligible — no cost to run a full ingest-query-lint cycle.

GitHub: https://github.com/axoviq-ai/synthadoc
Quick-start guide: https://github.com/axoviq-ai/synthadoc/blob/main/docs/user-quick-start-guide.md
Design document: https://github.com/axoviq-ai/synthadoc/blob/main/docs/design.md
Release notes: https://github.com/axoviq-ai/synthadoc/releases/tag/v0.4.0

Feedbacks are welcome. The routing taxonomy and context pack output format are both early - if your use case pushes against their current shape, we want to know.

DEV Community