A small collection of real multi-agent orchestration workflows I designed and ran while building a Rust library for private, end-to-end-encrypted communication (Nym mixnet transport, libsignal PQXDH crypto). The project itself is private; what is shared here is the orchestration: how I split hard problems across many AI subagents and then put the results through adversarial verification before trusting them.
Each workflow is a deterministic script that spawns subagents, runs them in parallel or in a pipeline, and collects structured results. The scripts are the actual ones I authored, with the project's product framing and absolute paths removed. The agent instructions, stage structure, and output schemas are intact, because those are the interesting part.
Most "let an AI review my code" or "let an AI research this" setups share one failure mode: the model produces confident, plausible output that is subtly wrong, and nothing catches it. The fix is structural, not a better prompt.
Fan-out: decompose the problem into independent lanes and give each lane its own agent with a narrow brief. Narrow briefs dig deeper than one agent told to "look at everything," and independent lanes do not contaminate each other's reasoning.
Verify: never trust a fanned-out finding directly. Hand each one to a separate agent whose job is to refute it, defaulting to "not real" under uncertainty. A finding only survives if a skeptic can concretely confirm it. This inversion is what filters out the plausible-but-wrong output that ordinary single-pass setups ship.
flowchart TD
P[Problem] --> A1[Lane 1 agent]
P --> A2[Lane 2 agent]
P --> A3[Lane 3 agent]
P --> A4[Lane N agent]
A1 --> S[Collect / synthesize<br/>structured output]
A2 --> S
A3 --> S
A4 --> S
S --> V1[Skeptic: refute claim 1]
S --> V2[Skeptic: refute claim 2]
S --> V3[Skeptic: refute claim N]
V1 --> R[Confirmed result<br/>survivors only]
V2 --> R
V3 --> R
- Adversarial verification, default-to-refuted. Verifier agents are told to try hard to prove a finding false and to return "not real" when uncertain. A false positive costs one wasted skeptic; a false negative ships a wrong conclusion. The asymmetry sets the default.
- Structured output schemas, not prose. Every agent returns JSON validated against a schema (findings, verdicts, conflict matrices). Results become data I can route, sort, filter, and gate on, instead of text I have to re-parse and hope I read correctly.
- Narrow lenses over one generalist. Five reviewers each owning one failure mode (crypto, invariants, spec, Rust safety, tests) out-find one reviewer asked to cover all five, because each goes deep instead of skimming.
- Pipeline when you can, barrier when you must. Where verification of one lane does not depend on the others, I verify each lane's findings the moment that lane finishes rather than waiting for all lanes. Wall-clock is the slowest single chain, not the sum.
- Right model per stage. Cheaper/faster models for mechanical lanes; the strongest model for synthesis and adversarial review where the reasoning is load-bearing.
- Know when to skip verify. Verification earns its cost only when a wrong conclusion is expensive to act on. A survey that informs a human decision does not need it. One of the three examples here is deliberately fan-out only, for that contrast.
The scripts use a small host API. Reading the workflows, these are the verbs:
| Primitive | What it does |
|---|---|
agent(prompt, opts) |
Spawn one subagent. With schema, it must return JSON matching that schema. opts also carries label, phase, model, and agentType. |
parallel([thunks]) |
Run tasks concurrently and wait for all (a barrier). |
pipeline(items, stage1, stage2, ...) |
Run each item through every stage independently, with no barrier between stages. |
phase(title) |
Group the agents that follow under a named stage for progress display. |
log(message) |
Emit a progress line. |
| Workflow | Shape | What it did |
|---|---|---|
| Dependency conflict analysis | 4 research lanes -> synthesize -> 3 adversarial refuters | Resolved whether nym-sdk 1.21.0 and libsignal 0.94.1 could coexist and how. Recommended and shipped the split-workspace architecture; confirmed an upstream compile bug was intrinsic. Writeup |
| Adversarial code review | 5 review lenses -> per-finding skeptic verify (pipeline) | Reviewed a security-sensitive codec diff before merge. ~26 agents surfaced 14 findings that were hardened before merge; shipped with the test suite green and zero lint warnings. Writeup |
| Parallel survey | 3 readers in parallel (fan-out only, no verify) | Mapped project state after a milestone merge: next-phase plans, existing CI/security gates, and security-tooling integration options. Included as the deliberate "when not to verify" contrast. Writeup |
Each writeup in docs/ breaks the workflow into its stages, lists every agent's
role and the exact instructions it was given, shows the output schema, and explains the
design choices behind it.
These ran on a private codebase. The technical substance is real and unchanged (the crate names, the dependency versions, the crypto design under review, the verification structure). Removed for sharing: the product's name and purpose framing, and machine-specific file paths.
MIT (c) 2026 Liz Rojas