Two CLI agents in conversation. One Python file. Stdlib only.
duet.py runs two command-line coding agents, usually Claude and Codex, in
alternating turns. You can also pair two agents from the same backend, such as
Codex planner + Codex coder or Claude coder + Claude reviewer. One agent can
plan or review while the other implements. The loop stops when they agree, the
turn limit is reached, a timeout happens, or you stop them.
Use duet when you want:
- A planner/reviewer agent to keep pressure on an implementation agent.
- A second agent to inspect test failures, issue text, or review output.
- A transcript and run directory you can inspect after the agents finish.
- Isolation through an optional git worktree while the partner agent edits.
Pair-programming pattern: plan with codex in its own session first, then hand the session id to duet — codex implements with the plan in context while claude reviews each turn.
cd ~/code/myrepo
# Find the codex session you just planned in:
# ls -lt ~/.codex/sessions/ | head
# or look for `session id: <uuid>` on `codex exec`'s stderr.
./duet.py \
--resume-codex <codex-session-id> \
--worktree \
--reasoning max \
--task "Implement the plan from your codex planning session."Four flags carry their weight; everything else is a default. Codex (resumed,
with the plan in context) speaks first as the coder. Claude reviews each
turn as the planner. The worktree keeps the host checkout clean until you
merge. Sentinel + rationale convergence rules are baked into both role
prompts — you do not need to restate them in --task.
Resume flags attach to the matching backend even when you override
--lead/--partner. A resumed Claude agent is normalized to lead so duet can
extract its latest message as the seed; a resumed Codex agent is normalized to
partner/coder so it speaks first with its existing plan in context.
The symmetric --resume-claude <session-id> does the inverse — plan in
claude, hand off to codex — and is duet's founding workflow, documented in
docs/USAGE.md.
Have a task in words but no prior planning session? Let codex plan inside the loop while claude implements:
./duet.py \
--recap \
--task "Add Codex fast mode for duet-managed Codex runs, don't miss any doc files" \
--lead claude:coder \
--partner codex:planner \
--worktree --worktree-for lead \
--turns 4--recap keeps the live output compact and the worktree keeps the host
checkout clean until you merge.
# Run a fresh task in a target project.
./duet.py --task "Implement fizzbuzz in Go with tests" \
--lead claude:coder --partner codex:planner \
--cwd ~/code/scratch
# Seed duet from Claude Code's real /review output.
./duet.py --recap --task-from-cmd 'claude -p /review' \
--lead claude:reviewer --partner codex:coder \
--worktree \
--cwd .In the review recipe, Claude's /review runs once to produce the kickoff
critique. Duet then hands that critique to Codex, preserves both agent
sessions, and manages the back-and-forth until convergence or the turn limit.
Install the duet command:
make install # symlinks duet.py to ~/.local/bin/duet
make test # unit tests (tests/test_duet.py) + scripts/smoke.sh dry-run checks
make unit-test # only the stdlib unittest suite under tests/
make smoke-test # only scripts/smoke.sh dry-run regression checks
make loop-test # slow real Claude/Codex loop checks; writes runs/test-loop/Each agent keeps its own conversation memory:
- Claude resumes with
claude -p --resume <session_id>. - Codex resumes with
codex exec resume <session_id>when duet captured one from Codex's stderr, orcodex exec resume --lastin the working directory as a fallback for older builds that don't print a session id.
On each turn, duet sends the latest reply from one agent to the other. It
continues until both agents accept convergence in back-to-back turns, --turns
is reached, a timeout happens, or you press Ctrl-C. A convergence proposal must
include an LGTM rationale: explaining why the work is done, followed by the
sentinel <<<LGTM>>> on its own line; a bare sentinel is ignored.
Finished runs record the reason in state.json: user interruption stays
force_stop, per-turn agent timeouts are timeout, and non-timeout agent
command failures or malformed required output are agent_error.
If you pass --verify-cmd, duet runs that shell command before counting a
valid convergence proposal. Exit code 0 allows the proposal to count; any
non-zero exit, timeout, or execution error feeds a capped failure block to the
next agent turn. --dry-run records and prints the configured command but
does not execute it.
After normal loop endings, duet opens a force> prompt. Press Enter to
finish, or type feedback to force another round; duet sends the next agent the
previous reply plus your feedback, including any appended worktree handoff
block and diff.
Call Claude Code's real /review skill through duet:
./duet.py --recap --task-from-cmd 'claude -p /review' \
--lead claude:reviewer --partner codex:coder \
--worktree \
--cwd ~/workspace/project \
--turns 6The /review skill supplies the initial findings; duet handles the subsequent
Codex fix turn, Claude verification turn, worktree diff handoff, and any extra
rounds.
With the optional Claude Code skill from docs/USAGE.md,
plain /duet runs that same /review kickoff recipe.
Let duet run the upstream command inside the target project:
./duet.py --task-from-cmd 'npm test 2>&1' \
--lead claude:coder --partner codex:planner \
--cwd ~/workspace/project \
--worktree --worktree-for leadUse a repeatable config:
./duet.py --config duet.example.yamlUseful packaged configs:
examples/pr-review.yaml- deep review of the latest commit.examples/codex-test-fix.yaml- Codex planner diagnoses failing checks; Codex coder fixes them in a worktree.
Same-backend peering:
# Codex planner + Codex coder. The worktree gives one Codex peer a separate cwd.
./duet.py --task "Fix the issue" \
--lead codex:planner --partner codex:coder \
--worktree --turns 6
# Claude coder + Claude reviewer.
./duet.py --task "Review and fix the current change" \
--lead claude:coder --partner claude:reviewer \
--turns 6For codex/codex runs in one cwd, duet requires both peers to produce Codex
session UUIDs on their first turns. If either peer would fall back to
codex exec resume --last, duet aborts rather than risk resuming the other
peer's session. Use --worktree when in doubt.
Require a mechanical check before convergence:
./duet.py \
--task "Fix the issue" \
--lead claude:coder \
--partner codex:reviewer \
--worktree --worktree-for lead \
--verify-cmd 'make test'Check an in-progress run from another terminal:
./duet.py --status .duet/runs/<id>/Gate convergence on P0/P1 review findings:
./duet.py --task "Fix the issue" \
--lead claude:coder --partner codex:triage-reviewer \
--cwd ~/workspace/projectReview a recent implementation - Codex reviews at max effort, Claude applies only requested fixes:
./duet.py --recap \
--task "Review the current main branch changes. Codex should act as reviewer: identify any blocking issues in the latest commit. Claude should act as coder: implement only the fixes Codex explicitly requests. Preserve project constraints and run make test before convergence." \
--lead claude:coder \
--partner codex:reviewer \
--reasoning max \
--worktree --worktree-for lead \
--turns 6The partner speaks first, so Codex (reviewer) opens turn 1 with its critique
and Claude (coder) responds in turn 2 with the fixes. --worktree-for lead
keeps the editable checkout under the coder. Keep --codex-fast off in this
recipe: Codex is the reviewer, so max effort is the point.
That same recipe is also packaged as a YAML config you can drop into any
repo — examples/pr-review.yaml reviews HEAD's diff with the same
agent/effort/worktree pairing, with comments calling out which keys to swap
for variants (review uncommitted changes, review a specific PR by number,
faster iteration once review is mostly done).
Review the latest commit plus an untracked notes file by seeding both into the task:
./duet.py --recap \
--task-from-cmd 'git show --stat --patch --no-ext-diff HEAD && printf "\n\n--- TODO.md ---\n" && cat TODO.md' \
--lead claude:coder \
--partner codex:reviewer \
--reasoning max \
--worktree --worktree-for lead \
--turns 6Fresh worktrees start from committed HEAD; commit the notes first if the coder
must edit them as a normal tracked file.
Deep planner, fast coder — Claude plans at high effort, Codex coder turns drop to low for latency (uses the default claude:planner + codex:coder pairing):
./duet.py --reasoning high --codex-fast \
--task "Fix the issue" \
--cwd ~/workspace/projectCompact live debug view — see only what each turn produced, in real time:
./duet.py --recap --task "Fix the issue" \
--lead claude:coder --partner codex:planner \
--cwd ~/workspace/projectEvery run writes a directory containing:
transcript.md- the full conversation.recap.md- compact per-turn debug view when--recapis enabled;--statusshows this path when present.state.json- run state, agent roles, session ids, finish reason, worktree metadata, andrecap_pathfor recap runs.turn-*.stderr.log- live stderr from each agent invocation.turn-*-verify.log- verify command metadata, stdout, and stderr when--verify-cmdruns.turn-*.pid- present only while an agent or verify command is running.wt/- the git worktree, when--worktreeis enabled.
When a worktree agent replies, duet appends a handoff block to that reply before
the diff. The block names the exact worktree path and branch, warns that the
receiving agent's cwd may be a clean checkout, and includes git -C <wt> review
commands so verification happens against the edited tree.
When --cwd points outside the invocation directory and --runs-dir is not
set, artifacts go under the target project at .duet/runs/<run_id>/.
Read docs/USAGE.md for the full reference: flags, sandbox and
network rules, worktree mode, output layout, --status / --continue, force
prompt behavior, session memory, the post-run "apply / iterate / discard"
checklist, and the optional /duet Claude Code skill.
For contributor guidance, read CLAUDE.md. Codex-specific entry notes live in AGENTS.md.
duet --continue <run>starts a fresh run from a priorstate.json, restores saved session ids, and reuses the previous worktree when available. It does not append to the old transcript.- Parallel Codex sessions in the same cwd are safe when duet captured a
session id from Codex's stderr — that turn pins to the UUID, not to recency.
When the UUID was not captured (old Codex builds, or continuing a pre-UUID
run), duet falls back to
codex exec resume --last, which is cwd-based and unsafe to share.--worktreeisolates duet's Codex cwd from the host repo; in--lastfallback mode, do not start another Codex session inside that same worktree while the run is active. - Transcripts capture full agent text. Convergence detection only counts rationale-backed sentinels outside fenced markdown code blocks.