A DAG slices intent into atomic leaves, each with a compare-and-swap guarantee and a postcondition it must assert before it counts.
Anatomy of the harness
Reliability is engineered, part by part
None of this lives inside the model. It’s the system around it, and the sum is an agent you can hand real work to.
eight systems
The harness, taken apart piece by piece
Roles decoupled from models. A conductor plans the DAG, deterministic dispatch drives it, executors run leaves in parallel. Detailed below.
Work compiles to a directed graph of atomic leaves. Compare-and-swap transitions keep state honest under parallelism; concurrency is tuned per model so nothing thrashes.
Raw events are distilled into layered memory by a “dream” pass. Every fact write passes restraint gates and a validator before it is allowed to persist.
One substrate unifies capability, behavioural fragments, and learned facts: searchable, migratable, and able to feed each other. Detailed below.
All Model-Context-Protocol tools converge on a single router. Schemas load on demand instead of bloating every prompt, for large, sustained token savings.
A pre-flight mode that can read and reason but not write: source-grounded planning, a tool-call gate, and a ledger re-examined each turn before any change is made.
A rotating pool of search providers plus content extraction that strips pages to signal. Research and retrieval without hand-feeding URLs.
Invariants are pinned as falsifiable contracts. Implementation that conflicts with a contract loses; a contract proven wrong by the code yields, on the record.
Pain → solution
Six ways agents break, engineered shut
Every weakness of a naive agent loop has a structural answer in the harness — not a prompt politely asking the model to behave, but a mechanism that makes misbehaviour impossible.
Every named field, table, or symbol is checked against a live LSP and a code graph before it is written down. Evidence routing maps structure, then confirms ground truth — identifiers are never guessed.
A grounding gate watches claims that need authority and binds them to a source; a domain lexicon makes statutory or field facts impossible to invent.
Hashline edits verify each patch at the line level; an LSP feeds real-time symbols and types. A misplaced edit is caught and corrected, never committed.
A drift detector watches for repetition, classifies the wander, and pulls the run back to the last good anchor before tokens burn on a loop.
Every leaf must assert its postcondition before it counts; high-risk seams face adversarial verifiers whose job is to refute the result. A failed check loops back, never silently past.
Four powers, one task
Roles decoupled from models
Each of the four powers has its own charge. None of them is bound to a single model; that binding is a runtime choice, made fresh per task.
Breaks intent into an atomic DAG of leaves and decides what runs, in what order, and what each leaf must prove. Reasoning-heavy — the design brain. Swappable to whatever plans best: Opus, Codex, GLM.
Opus · Codex · GLMPulls ready leaves off the DAG, layers them topologically, and fans them out under bounded concurrency — compare-and-swap keeping state honest the whole way. Deterministic harness code, no model in the loop. The runtime that refuses to let drift past.
deterministic · no modelRuns a single leaf — the atom of work: one slice of code, edits, or tool calls, with its own fault boundary. Cheap and massively parallel: DeepSeek 256-wide, MiMo 8-wide for multimodal, concurrency tuned per model so nothing thrashes.
MiMo · DeepSeekA cross-model skeptic — any provider you point it at, never bound to one — whose job is to attack the result, not bless it. It fires where green is not enough: high-risk seams, contract boundaries, phase ends. Find a real flaw and the conductor silently escalates to a stronger model and re-plans; otherwise the leaf counts. Off by default, one round, capped — a lens, not a ritual.
model-agnostic · cross-modelOne executor, three kinds of leaf
The conductor tags every node with a kind; dispatch fans a layer out side by side, and each leaf runs its own way — the cheapest reliable call is the one no model has to make.
A single model call — pure text generation, no tools.
In-process through the parallel primitive; never touches the database.
A tool-bearing sub-agent — multi-turn, can call tools.
The configured agent runner drives a full sub-loop.
A deterministic CLI — no model at all.
The command runner executes the node’s command; exit 0 means done.
Safe by construction: an agent leaf with no runner degrades to inproc — no tools, no writes — and warns; a command leaf with nothing to run fails outright, never silently.
↩a failed check loops back to the conductor, never silently past it
Hooks & seams
Built on pi events — injectable, configurable
Reliability you can extend. Three hooks ride pi’s event stream; five more are seams woven through the harness — each a place your own code, config, or domain pack can plug in.
pi-event hooks
A tool-execution gate: a dangerous-command guard plus an allowed-tools whitelist. Nothing runs that you didn’t sanction.
A metacognitive safety net. Detects spinning-in-place → onSpinning; escapes a stuck loop → onRecovered. Tunable threshold and re-injection.
Guards irreversible operations — push --force, rm -rf — before they ever reach the shell.
wright injectable seams
validateFactWrite + namespaces, reject-by-default. A domain pack overrides what is allowed to persist.
Observes message output; an injectable lexicon + action. A domain injects its statutory wordlist so answers carry sources.
onFailure / onMiss / onCorrection / onSpinning / onRecovered / onGrounded — wire runtime signals into your own sink.
resolveRoleModel + .wright/config.json: conductor, executor, leaf, and dream each bind to a model you can swap.
/mcp add registers a server at runtime; mcp_search routes to it. New tools without touching the gateway.
Concept here · file-level detail lives in the docs.