feat: Phase 1 of typed on_error routing (#227) by PolyphonyRequiem · Pull Request #229 · microsoft/conductor

PolyphonyRequiem · 2026-05-21T23:37:14Z

Phase 1 of the typed on_error routing design from #227 (RFC).

Companion PR to #227 (brainstorm spec). Spec excerpts inline below;
full design and open questions live in the RFC.

What ships in this PR (Phase 1 scope from the RFC)

Script env-var-file contract — CONDUCTOR_ERROR_OUT points at a temp
file the script writes a typed envelope into; engine reads on exit. Works
uniformly for pwsh, bash, python, node, dotnet (dotnet run --
style).
Agent JSON discriminator — {conductor_error: true, kind, message, details}
in agent output is treated as a raise rather than a regular response.
RouteDef.on_error: bool | str | list[str] — routes can match on
the failing node''s typed error kind.
{{ failing_node.error }} template scope — handler nodes get the
envelope of the node that raised, alongside its (possibly partial) output.
Halt-on-unhandled at root — if no error route matches, the workflow
halts; engine writes errors.jsonl (TMPDIR pattern, alongside the event
log) and emits a typed workflow_failed event. CLI maps the exception to
new exit code 3.
Optional AgentDef.raises: list[str] — declared kinds are linted
against route on_error declarations at validation time and checked at
runtime; undeclared kinds are wrapped as internal.undeclared_kind.
Engine-agnostic helpers under src/conductor/helpers/error/ for
pwsh / bash / python / node / dotnet (ship in the wheel).

What''s reserved for Phase 2/3 (out of scope here)

Sub-workflow envelope propagation across the boundary.
retry / halt / propagate route actions.
Provider-exhaustion synthetic kind.
on_error on routes from type: workflow, human_gate, notification,
and parallel/for_each groups is currently a hard validation error
(avoids silent "handler that never fires" footguns). These will become
valid in Phase 2.

Reserved kinds emitted in Phase 1

internal.script_error — script exited non-zero AND wrote no envelope
(opt-in: only synthesized when the node has raises or any on_error
route, so legacy exit_code-routing workflows are unaffected).
internal.schema_violation — agent output failed its output: schema.
internal.undeclared_kind — node with raises: raised a kind not in
its list; original kind preserved under details.original_kind.

Reserved kind prefixes (validator forbids users declaring these):
internal., provider., subworkflow., retry..

How to read this PR

The 14 commits map 1:1 to the implementation steps from the plan and are
designed to be reviewed in order. Each is independently testable and
green:

#	Commit	Touch points
1	`feat(schema)`	`config/schema.py`
2	`feat(engine)` ErrorEnvelope types	`engine/errors.py`, `exceptions.py`
3	`feat(router)` success vs. error buckets	`engine/router.py`
4	`feat(context)` store_error + .error access	`engine/context.py`
5	`feat(validator)` cross-check	`config/validator.py`
6	`feat(agent-exec)` envelope path	`executor/agent.py`
7	`feat(script-exec)` `CONDUCTOR_ERROR_OUT`	`executor/script.py`
8	`feat(engine-wire)` leaf-error path	`engine/workflow.py`
9	`feat(halt-jsonl)` errors.jsonl + event	`engine/workflow.py`
10	`feat(cli-exit)` exit code 3	`cli/app.py`
11	`feat(helpers)` 5 language helpers	`helpers/error/*`
12	`feat(examples)`	`examples/error-routing*.yaml`
13	`test(xeng)` cross-engine envelope contract	`tests/test_integration/`
14	`phase-1(checks)` final ty/lint/test sweep	minor casts

Examples

examples/error-routing.yaml — script-based; uses the
CONDUCTOR_ERROR_OUT contract directly with no helper. Workflow input
simulated_failure toggles ok / drift / rate_limited.
examples/error-routing-helpers.yaml — same shape using the shipped
Python helper.

Both validate, both run on Windows and POSIX, both render
{{ failing_node.error.kind }} / .message / .details in the
handler''s prompt.

Test posture

2943 tests pass; 12 baseline failures (perf flake, event_log
non-serializable, registry/integration ×10) are pre-existing and
unchanged.
New tests: 8 engine error-routing tests, 6 helper tests, 2 CLI
exit-code tests, 3 cross-engine integration tests (Python + pwsh run
on Windows; bash skipped on Windows by design — WSL relay shim
unreliable in CI envs).
Lint clean (ruff check), format clean (ruff format --check).
ty check src back to the 12-diagnostic baseline (all pre-existing
Windows termios/tty noise).
make validate-examples equivalent green across all 17 bundled
examples including the two new ones.

Open Phase 1 micro-decisions (resolved in this PR)

Exit code value: 3 (1 = generic, 2 = reserved misuse, 3 = workflow
halted on unhandled error).
errors.jsonl path: same $TMPDIR/conductor/ convention as the event
log; printed at end-of-run.
Frame trail in errors.jsonl: single-element today, structured to
accept multi-frame in Phase 2 without a shape change.
Convenience {{ workflow.last_error }} not added (RFC says require
{{ failing_node.error }}); can revisit in Phase 2.

cc @jasonrobertfox — companion to the RFC at #227.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

…ting Phase 1, step 1 of on_error routing (per design brief in docs/projects/error-routing/on-error-routing.brainstorm.md). Adds two opt-in schema fields and a small shared module for the constants both schema validation and the engine error path will use: - src/conductor/error_kinds.py - KIND_PATTERN: dotted lowercase identifier (at least one dot). - RESERVED_KIND_PREFIXES: internal., provider., subworkflow., retry. (the runtime owns these namespaces; workflow authors cannot declare kinds under them). - RESERVED_ON_ERROR_ALLOWLIST: the closed set of runtime-synthesized kinds that ARE legal to match in on_error even though they're not legal to declare in raises (internal.script_error, internal.schema_violation, internal.undeclared_kind). - is_reserved_prefix(kind) helper. - RouteDef.on_error: bool | str | list[str] | None - None = success route (existing behavior). - True = catch-all error route. - str = single-kind error route. - list[str] = multi-kind error route. - False is rejected (no semantic meaning). - Kind format enforced via KIND_PATTERN. - 'before'-mode validator so Pydantic's bool/str coercion doesn't swallow the discriminator. - AgentDef.raises: list[str] | None - Optional declaration of kinds the node may raise. - Powers a load-time lint (cross-checked against routes' on_error in the validator, landing in a follow-up commit) and a runtime undeclared-kind check (will land with the engine-wiring commit). - Reserved prefixes rejected so authors can't claim runtime namespaces; duplicates rejected; format enforced via KIND_PATTERN. Tests: - tests/test_error_kinds.py — 24 cases covering pattern + prefix + allowlist invariants (allowlist entries must themselves be reserved). - tests/test_config/test_schema.py::TestRouteDefOnError — 14 cases. - tests/test_config/test_schema.py::TestAgentDefRaises — 10 cases. No semantics change for existing workflows: both fields default to None and the engine doesn't observe them yet (wiring lands in subsequent commits). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…exceptions Phase 1, step 2 of on_error routing. Adds src/conductor/engine/errors.py: - ErrorEnvelope TypedDict — the internal {kind, message, details} shape. Strips the on-the-wire conductor_error: true discriminator so callers don't see it in {{ failing_node.error.* }} templates. - EnvelopeValidationError — distinct from ValidationError so the engine can catch and translate malformed envelopes into synthetic internal.* kinds rather than halting with a generic config error. - coerce_envelope(raw) — validates on-the-wire input, normalizes details to {} when absent. - make_script_error(exit_code, stderr_tail, command) — synthesizes the internal.script_error envelope. - make_schema_violation(node_name, source, original_message, failed_field?) — synthesizes the internal.schema_violation envelope with rich details for the swallowed-by-catch-all diagnostics case. - wrap_undeclared_kind(original, declared) — wraps an envelope whose kind isn't in the node's declared raises list. Preserves the original kind/message/details under details.original_* so an author handling internal.undeclared_kind can still recover the intent. Adds two exceptions to src/conductor/exceptions.py: - UnhandledNodeError — internal signal raised by the router when an error envelope reaches no on_error route at the current level. The engine catches this at the per-node dispatch site and re-raises as UnhandledWorkflowError. Not intended to surface to end users. - UnhandledWorkflowError — workflow halted on a typed error envelope. Carries the envelope and a frame trail (single frame in Phase 1; Phase 2 will accumulate frames across sub-workflow boundaries). CLI maps this to a distinct exit code so callers can distinguish 'workflow ran and halted on typed error' from generic failures. Tests: tests/test_engine/test_errors.py — 18 cases covering envelope coercion (including discriminator stripping and details normalization), the three synthetic-kind constructors, and the exception classes including the empty-frames defensive path. Nothing yet emits these envelopes or exceptions; the next commits wire them through the router, executors, and engine dispatch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Route evaluation now partitions by RouteDef.on_error: - on_error is None -> success bucket, evaluated when error=None - on_error is set -> error bucket, evaluated when an envelope is passed; the route's on_error matcher (True | str | list[str]) must match envelope[kind] Behavior preserved on the success path: first matching when: wins, no catch-all raises the existing ValueError. New: error-path exhaustion raises UnhandledNodeError carrying the envelope so the engine can translate it into UnhandledWorkflowError at the call site. Error-route eval context exposes the envelope as `error` for both Jinja2 ({{ error.kind }}) and simpleeval (kind == 'x.y' via flatten). Adds 12 tests in TestRouterErrorBucket covering bucket isolation, all three on_error matcher shapes, when: combined with on_error, output: transforms, ordering within the bucket, the new UnhandledNodeError path, and the legacy ValueError path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds WorkflowContext.store_error(agent_name, envelope) that co-locates error envelopes with their producing node's slot. The rendered context shape for a node is now `{node: {output?, error?}}`. All three context modes surface errors: - accumulate: each failing node gets `{agent: {error: envelope}}` - last_only: failing last agent surfaces with `{error: ...}` - explicit: declarations of the form `agent.error[.field]` copy the whole envelope into ctx[agent]['error']. Envelopes are bounded and templates commonly need `error.details.*`, so the runtime never field-slices them. Validator updates so existing semantic checks cover the new path: - INPUT_REF_PATTERN gains an `error_agent`/`error_field` branch matching `<agent>.error[.field]`. - _OUTPUT_ATTRS includes the singular `error` so Jinja AST analysis treats `{{ failing.error.kind }}` as a real output-class ref. - TemplateRefs gains `agent_error_refs: set[str]` and the AST walker populates it. - The per-agent template walk emits the same explicit-mode `undeclared input` warning for `.error` refs that `.output` and group `.errors` already get. - Unknown-agent checks cover the `.error` ref path. - Parallel-group internal-dependency check rejects intra-group `.error` refs too. Checkpoint round-trip via to_dict/from_dict serializes `agent_errors`; older checkpoints without the key restore as empty (backwards-compat). Adds 14 tests in TestWorkflowContextStoreError, 5 INPUT_REF_PATTERN shape tests, and 3 TemplateRefs error-extraction tests. Fixes the test_empty_context dict-equality fixture to include the new field. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…type Adds two new helpers in conductor.config.validator: - _validate_on_error_routes(agent): hard-errors on_error routes on node types that don't raise envelopes in Phase 1 (human_gate, workflow); validates each kind matches KIND_PATTERN; if agent.raises is declared, every concrete kind in on_error must be in raises or RESERVED_ON_ERROR_ALLOWLIST (catch-all rue always legal). - _validate_group_routes_no_on_error(): rejects on_error routes on parallel and for_each groups (group-level envelopes are Phase 2). Both helpers are wired into validate_workflow_config(). 11 new tests cover plain agent + script (legal), human_gate + workflow + parallel + for_each (rejected), bad kind format, undeclared-kind cross-check, catch-all, reserved allowlist, and no-raises = no constraint. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ions AgentOutput grows an optional `error: dict | None` field that carries an ErrorEnvelope when the agent failed. AgentExecutor.execute() now, after the provider call returns: 1. If the response is a dict with `conductor_error: true`, coerces the envelope (or synthesizes `internal.schema_violation` when the envelope itself is malformed) and attaches it to output.error WITHOUT running validate_output (the declared output schema doesn't apply to error envelopes). 2. Otherwise runs validate_output, and on ValidationError synthesizes an `internal.schema_violation` envelope on output.error instead of raising. Partial outputs (from mid-agent interrupts) bypass both checks. The error module is imported lazily inside execute() to avoid a circular import via conductor.engine.__init__. Updated two existing tests to assert the new envelope contract instead of expecting ValidationError. Added three new tests covering well-formed envelopes, malformed envelopes, and the partial-output bypass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…nthesize internal.script_error ScriptOutput grows an optional `error: dict | None` field. Each script run now allocates a tempfile via tempfile.mkstemp(), closes the fd immediately, and exposes the path in the env as CONDUCTOR_ERROR_OUT — set AFTER the agent.env merge so users cannot accidentally redirect or override it. After process.communicate() the executor reads the file: - empty / missing → no envelope - valid JSON envelope → coerce_envelope, attach to output.error - valid JSON but malformed envelope → internal.schema_violation - invalid JSON → internal.schema_violation If no envelope was written AND exit_code != 0 AND the node has opted into error routing (raises declared OR any on_error route present), the executor synthesizes an `internal.script_error` envelope. Legacy workflows that route on exit_code (no opt-in) keep their existing behavior. Temp file is always removed in finally — even on timeout/command-not-found. Added 7 tests covering: no envelope on success, well-formed envelope surfaces, user env cannot override, synthesized internal.script_error on opt-in, legacy non-zero with no opt-in keeps error=None, malformed envelope downgrades to schema_violation, temp file cleanup. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ary, group failure carry envelope Wires the on_error contract through the workflow engine end-to-end: - _evaluate_routes() now accepts an optional error envelope; empty routes plus an unhandled envelope raises UnhandledNodeError instead of silently routing to \. - New _normalize_envelope_for_node() applies undeclared-kind wrapping (raises declared + kind not in raises + not in allowlist → internal.undeclared_kind with original kind preserved under details.original_kind). - New _handle_leaf_error() centralizes the leaf path: normalize, store_error, evaluate error routes, raise UnhandledWorkflowError with a single-leaf frame trail on no match. - Agent call site (~2583) and script call site (~2359) both branch on output.error BEFORE storage and BEFORE schema validation. Script's success path runs full output-schema validation as before; error path skips it. - Sub-workflow call sites catch UnhandledWorkflowError from child engines and re-raise as ExecutionError. Phase 1 invariant: envelopes do not propagate across the sub-workflow boundary. - ParallelAgentError and ForEachError grow an optional envelope field. The parallel/for_each child execution helpers detect output.error, normalize it, raise an ExecutionError tagged with ._envelope, and the existing failure_mode machinery records it. Downstream group consumers can inspect the typed envelope. - Tests: 6 new tests in tests/test_engine/test_error_routing.py covering agent envelope routing, unhandled-envelope halt, undeclared-kind normalization (with the rescue agent reading the original kind from context), success-path regression, script envelope routing, and legacy exit_code routing regression. Full suite (2887 tests) green; no regressions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…on UnhandledWorkflowError Phase 1 Step 9. When the engine raises UnhandledWorkflowError (a leaf node returned an error envelope that no on_error route matched), the new `except UnhandledWorkflowError` arm in `_execute_loop`: * writes a single-line `errors.jsonl` record under \\\/conductor/\\ using the same naming convention as the event log (\\conductor-<workflow>-<ts>-<run_id>.errors.jsonl\\), carrying the envelope, frame trail, and leaf node name; * emits a typed \\workflow_failed\\ event with \\�rror_type='UnhandledWorkflowError'\\ plus the envelope, frames and errors-jsonl path so dashboard/log subscribers can render the typed halt distinctly from a generic ConductorError; * runs the \\on_error\\ lifecycle hook and saves a checkpoint, for parity with the other failure paths; * re-raises so the CLI (Step 10) can map it to its distinct exit code. The new arm is placed *before* the existing \\�xcept ConductorError\\ so the typed halt is caught first. Generic ConductorError handling is unchanged. Adds two integration tests covering the jsonl artefact and the typed event. Also tightens a docstring that was 101 chars (pre-existing from Step 8 — caught now by ruff format check). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…stderr summary Phase 1 Step 10. Both \\ un\\ and \\ esume\\ now catch \\UnhandledWorkflowError\\ specifically before the generic \\�xcept Exception\\, render a typed panel to stderr via the new \\print_unhandled_workflow_error\\ helper, and exit with code 3 so callers (CI, polyphony, shell scripts) can distinguish 'workflow ran to completion and halted on a typed error' from a generic failure (code 1). The summary panel surfaces the leaf node, kind, message, optional \\details\\, and the path to the \\�rrors.jsonl\\ artefact. To make the path reachable from the CLI without holding a reference to the engine, \\_execute_loop\\ now also attaches the path to the exception instance (\\�xc.errors_jsonl_path\\) before re-raising. Adds two CLI tests: * unhandled typed halt exits 3, * generic \\RuntimeError\\ still exits 1 (regression guard). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…n/node/dotnet Phase 1 Step 11. New \\src/conductor/helpers/error/\\ directory with one-file convenience modules for raising typed Conductor error envelopes from script-type nodes. All helpers implement the same contract: read \\CONDUCTOR_ERROR_OUT\\, write the \\{conductor_error: true, kind, message, details?}\\ JSON envelope to that path, and return — leaving exit-code management to the caller. * \\Conductor.Error.psm1\\ — PowerShell, \\Write-ConductorError\\ cmdlet * \\conductor-error.sh\\ — Bash/sh, sourced \\conductor_error\\ function * \\conductor_error.py\\ — Python, \\ aise_kind\\ function * \\conductor-error.mjs\\ — Node, exported \\ aiseError\\ * \\ConductorError.cs\\ — .NET, static \\ConductorError.Raise\\ * \\README.md\\ — quick reference + usage examples per engine Helpers ship under \\src/conductor/\\ so hatchling rolls them into the wheel; verified by inspecting the built artefact. Nothing is auto-loaded, on PATH, or on PYTHONPATH — script authors must explicitly Import-Module / source / import to use them, and authors who don't want them write the JSON themselves (it's three lines per engine). Adds \\ ests/test_helpers/test_error_helpers.py\\ with 6 cases covering the Python helper's envelope shape, env-var contract, no- sys.exit guarantee, and round-trip through \\coerce_envelope\\. The non-Python helpers are exercised by the cross-engine integration test landing in Step 13. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Phase 1 Step 12. Two new example workflows under \\�xamples/\\ plus an Error Routing section in \\�xamples/README.md\\. \\�rror-routing.yaml\\: * A \\ ype: script\\ probe that writes \\{conductor_error: true, kind, message, details}\\ to \\\\\ and exits 0 via the language-neutral contract (no helper required). * The probe declares \\ aises: [external.git.drift, external.api.rate_limited]\\ so undeclared kinds get normalised to \\internal.undeclared_kind\\. * Routes select by \\on_error: <kind>\\ to demonstrate that an envelope picks the typed arm over a generic exit_code fallback. * Handlers read \\{{ probe.error.kind }}\\, \\{{ probe.error.message }}\\, and \\{{ probe.error.details.* }}\\. * A \\simulated_failure\\ workflow input toggles between \\ok\\ / \\drift\\ / \\ ate_limited\\ so the same YAML exercises all three arms. \\�rror-routing-helpers.yaml\\: * Same flow, but raises via the shipped Python helper (\\conductor.helpers.error.conductor_error.raise_kind\\) instead of hand-rolled JSON, so authors see the ergonomic version. Both examples validate with \\conductor validate\\ and the full \\make validate-examples\\ sweep (17/17 pass). Caught one writing issue along the way: the input schema field is \\input:\\ (singular), not \\inputs:\\ — fixed before committing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Phase 1 Step 13 (acceptance #1). New \\ ests/test_integration/test_error_routing_cross_engine.py\\ exercises the \\CONDUCTOR_ERROR_OUT\\ contract end-to-end through the real \\WorkflowEngine\\ (no executor mocking) with three writer scripts in different languages: Python, PowerShell (\\pwsh\\), and bash. All three share the same workflow YAML shape and the same expected outcome: the probe writes a typed envelope, the engine routes by \\on_error: external.git.drift\\, and a rescue agent reads the kind back from the routed-to scope. Each test is skipped when the corresponding interpreter is missing from PATH (and bash is also skipped on Windows, where \\shutil.which\\ typically returns the WSL relay shim that fails with an opaque \\�xecvpe(/bin/bash) failed: No such file or directory\\ — outside the scope of this contract test; the brief calls for bash-on-Linux specifically). Locally on Windows: 2 pass (python + pwsh), 1 skipped (bash). On a Linux CI runner with pwsh installed, all three execute. The contract is the same string of bytes in every engine, so identical assertions hold across them — which is the whole point. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ty's TypedDict to dict() conversion picks the wrong overload for our ErrorEnvelope, surfacing 8 spurious diagnostics. Use cast() at the leaf sites where TypedDict envelopes cross into APIs typed as dict[str, Any] (script/agent executor returns; workflow.py store_error and router.evaluate call sites). Brings ty count back to the 12-diagnostic baseline (all pre-existing Windows termios/tty noise). Lint, format, examples-validation, and full test suite (2943 pass, 12 baseline failures) all green. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

codecov-commenter · 2026-05-21T23:46:43Z

Codecov Report

❌ Patch coverage is 86.85832% with 64 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@efa520f). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/conductor/engine/workflow.py	75.47%	26 Missing ⚠️
src/conductor/config/validator.py	83.82%	11 Missing ⚠️
src/conductor/engine/context.py	80.00%	9 Missing ⚠️
src/conductor/executor/script.py	87.87%	8 Missing ⚠️
src/conductor/cli/app.py	81.81%	6 Missing ⚠️
src/conductor/config/schema.py	93.87%	3 Missing ⚠️
src/conductor/engine/router.py	96.42%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #229   +/-   ##
=======================================
  Coverage        ?   88.20%           
=======================================
  Files           ?       65           
  Lines           ?    10115           
  Branches        ?        0           
=======================================
  Hits            ?     8922           
  Misses          ?     1193           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

PolyphonyRequiem · 2026-05-28T22:58:38Z

Attempted to rebase feature/error-routing onto v0.1.18 for local polyphony dogfood. Two mechanical conflicts at commit 1/14 (schema.py: import block + adjacent validator) resolved cleanly. Hit a semantic conflict at commit 3/14 (context.py / _add_agent_input) and aborted — not pushed.

Conflict summary: v0.1.18's set-step PR initializes output context as {"output": {} if is_dict_output else None} — the None seed is required by TestWorkflowContextNonDictOutputs::test_explicit_mode_scalar_field_optional_skips, which asserts rendered["compute"]["output"] is None. This branch's commit 17c7cc7 changes the same method to agent_outputs.get(agent_name) + simplified {"output": {}} init, which conflates 'agent never ran' with 'agent produced None output'. The right fix is likely a _MISSING sentinel to distinguish the two cases — but that requires a judgment call from this PR's author.

Built the dogfood from efa520f (pre-v0.1.18 base) where the conflict doesn't exist. All 61 error-routing tests pass in the dogfood.

PolyphonyRequiem · 2026-05-28T23:45:50Z

Tried rebasing this onto v0.1.18 base for polyphony dogfood integration, and hit a design conflict in src/conductor/runtime/context.py.

The clash: PR #229 commit 17c7cc7 uses agent_outputs.get() with a {"output": {}} default to treat missing agent output uniformly with None output. But v0.1.18 (set-step PR) introduces is_dict_output flag tracking + None seed init for non-dict outputs—required by TestWorkflowContextNonDictOutputs. These two patterns conflate "agent never ran" with "agent produced None output."

The _MISSING sentinel pattern would resolve it cleanly, but I'm curious how you intended this to reconcile with the v0.1.18 init semantics. Happy to follow whichever direction you'd prefer.

Interim: Polyphony dogfood is staying pinned at v0.1.17 base (efa520f) in the meantime—not a blocker for you, just a heads-up. We'll rebase once the upstream design call lands.

PolyphonyRequiem · 2026-05-28T23:47:15Z

Tried rebasing this onto v0.1.18 base for polyphony dogfood integration, and hit a design conflict in src/conductor/runtime/context.py.

The clash: PR #229 commit 17c7cc7 uses agent_outputs.get() with a {"output": {}} default to treat missing agent output uniformly with None output. But v0.1.18 (set-step PR) introduces is_dict_output flag tracking + None seed init for non-dict outputs—required by TestWorkflowContextNonDictOutputs. These two patterns conflate "agent never ran" with "agent produced None output."

The _MISSING sentinel pattern would resolve it cleanly, but I'm curious how you intended this to reconcile with the v0.1.18 init semantics. Happy to follow whichever direction you'd prefer.

Interim: Polyphony dogfood is staying pinned at v0.1.17 base (efa520f) in the meantime—not a blocker for you, just a heads-up. We'll rebase once the upstream design call lands.

jrob5756

Took a pass on the overall approach. I like the direction: success/error route buckets fit conductor's existing model cleanly, the opt-in synthesis keeps legacy exit_code routing untouched, and the phasing discipline (deferring subworkflow/group propagation as hard validation errors rather than silent no-ops) is the right call. Two pieces of feedback before this locks in the authoring contract:

1. The script transport is good — let's just document the asymmetry.

To be clear for others reading: CONDUCTOR_ERROR_OUT is a file path, not JSON-in-an-env-var, so escaping/size concerns don't apply — this is the $GITHUB_OUTPUT pattern and I think it's the right choice. The reason it's a file (and not a stdout discriminator like agents use) is that type: script already parses stdout as the output: contract, so errors need an out-of-band channel. That justification is sound, but it means the raise mechanism differs by node type (agents: in-band stdout conductor_error: true; scripts: out-of-band file). Please call that split out explicitly in docs/workflow-syntax.md so authors aren't surprised.

The parse-failure handling is solid — malformed JSON and malformed envelopes both downgrade to internal.schema_violation rather than dropping the signal, and reading only after communicate() returns means there's no write/read race on the non-atomic write. 👍 One thing to confirm in the correctness pass: malformed→schema_violation fires regardless of opt-in, whereas synthesized internal.script_error only fires when the node opted in. I think that's intended, but let's make sure a node that writes garbage with no on_error route halting on exit 3 (vs. legacy exit-code routing) is the behavior we want.

2. Trim the bundled multi-language helpers.

The contract itself is great precisely because it's language-neutral — "write a 3-field JSON file" — and you've kept it fully optional (the raw example uses no helper, plain echo > "$CONDUCTOR_ERROR_OUT" works). That part I'm fully on board with.

What I'd push back on is shipping conductor-error.sh, conductor-error.mjs, Conductor.Error.psm1, and ConductorError.cs inside the Python wheel. A pwsh/.NET/node user can't naturally consume a file buried in site-packages/conductor/helpers/error/ — there's no Import-Module/NuGet/npm story and the path is venv- and version-specific, so in practice these are copy-paste snippets dressed up as libraries. It's also five implementations of a trivial contract to keep in sync, and it undercuts the "it's just a JSON file" message.

Proposal: keep the Python helper (conductor is Python; inline python -c and script nodes make a real importable module genuinely useful) and demote the other four to documented copy-paste snippets in docs/ rather than shipping source in the wheel. Same ergonomic win where it's natural, without the maintenance/consumption awkwardness.

Neither of these is a structural objection — the file side-channel and the fail-safe parse handling are the right foundation. Mostly packaging + docs. Will follow up with correctness notes on the router/engine wiring.

jrob5756 · 2026-06-01T18:27:26Z

+    try:
+        with open(path, encoding="utf-8") as f:
+            raw = f.read()
+    except OSError:


Silent envelope drop on read failure when exit_code == 0.

Swallowing OSError → None here is fine when the script never wrote a typed error (the file legitimately doesn't exist / is empty), but it conflates that benign case with a real read failure on a path the engine itself created. Consider this path:

Script writes a valid envelope to $CONDUCTOR_ERROR_OUT (raising a typed error) and exits 0, relying on the file as the failure channel.

The read here raises OSError (permissions, transient FS error, racy temp cleanup) → _read_error_envelope returns None.

Back at execute() (~line 191), synthesis is gated on exit_code != 0 and _node_uses_error_routing(agent), so nothing fires.

ScriptOutput.error stays None, the engine sees a success, and the node is routed down the success path — no envelope, no log, no event.

This is the exact silent-failure class typed error routing exists to prevent. An unexpected OSError on an engine-owned path should not be conflated with "script chose not to raise" — at minimum emit a diagnostic, and ideally synthesize a read-failure variant of internal.script_error / internal.schema_violation so the typed-failure signal is preserved regardless of exit code.

Side note: open(..., encoding="utf-8") defers decoding to f.read(), so a UnicodeDecodeError is not caught by except OSError here — that crashes rather than silently dropping, which is arguably the lesser evil but inconsistent.

jrob5756 · 2026-06-01T18:27:26Z

+_LEAF_TYPES_THAT_RAISE: frozenset[str | None] = frozenset({"agent", "script", None})
+
+
+def _validate_on_error_routes(agent: Any) -> list[str]:


Validator gap + two related correctness items it would be natural to address together here.

a) Validator gap — node with only on_error routes crashes generically on the happy path. This validator accepts a node with only error routes and no success fallback. On a successful run, Router._evaluate_success (router.py:103-121) skips every route whose on_error is not None, then raises a plain ValueError("No matching route found..."). That bubbles up as a generic exception → CLI exit 1 → no errors.jsonl, no typed halt panel, despite the workflow being clearly authored for the typed-error world. Verified locally. Recommend rejecting at validation here with a teaching message like "Agent 'X' has on_error routes but no success route. Add a route without on_error to handle the happy path."

b) Related — _evaluate_error silently ignores output_for_route (engine/router.py:138-176, called from _handle_leaf_error at engine/workflow.py:4452). The plumbing accepts output_for_route but _evaluate_error only flattens error into the eval context and drops current_output. Any error-route when: clause referencing the failing node's output (e.g. attempt_count > 3) will silently see the previous agent's output. Either merge output_for_route into the error eval context (more useful) or drop the plumbing (more honest), but the current silent inconsistency is a latent footgun.

c) Related — agent schema-violation auto-upgrades existing workflows from exit 1 → exit 3 (executor/agent.py:282-302). Pre-PR: schema violation → ValidationError → exit 1. Post-PR: synthesizes internal.schema_violation envelope → unhandled → exit 3 + errors.jsonl. This is asymmetric with the script executor's _node_uses_error_routing opt-in (script.py:53-67), which preserves legacy behavior for unopted scripts. Either gate the agent upgrade on the same opt-in (so the asymmetry is principled, not accidental), or document the exit-code change in release notes.

jrob5756 · 2026-06-01T18:27:26Z

+            # path treats it like any other sub-workflow failure. Phase 2
+            # will introduce envelope propagation with parent frames.
+            raise ExecutionError(
+                f"sub-workflow '{agent.name}' halted on unhandled error envelope "


Missing integration tests guard documented Phase-1 invariants.

Four integration boundaries have documented contracts but no tests; a regression on any of them would silently change semantics in ways unit tests won't catch.

Sub-workflow downgrade (this site and the mirror at line 1294). The module docstring at tests/test_engine/test_error_routing.py:15-16 claims "Phase 1 does NOT propagate envelopes across sub-workflow boundaries" — but no test exercises either branch. Add a workflow with a type: workflow step whose child halts on an unhandled envelope; assert the parent sees ExecutionError, not UnhandledWorkflowError, and that the parent's own on_error routes do NOT fire.

Parallel / for_each envelope plumbing. ParallelAgentError.envelope / ForEachError.envelope are populated in production at workflow.py:3853, 3909, 4310, 4343 from the _envelope attribute attached at :3756 and :4213. The only tests of these dataclasses (tests/test_engine/test_parallel.py:504-536) pre-date the PR and construct them in isolation. Add a test that runs a real parallel/for_each group with a typed-envelope child and asserts the envelope is preserved on the group's error.

step.error input ref without producer's raises. The validator (config/validator.py:1351 walking output/outputs refs) doesn't cross-check a leaf-node producer.error reference against producer.raises. A consumer that declares inputs: ["fetch.error"] against a fetch node with no raises: passes validation and only KeyErrors at runtime via _add_agent_input in engine/context.py.

resume exit-code-3 path. cli/app.py:930-933 mirrors the run exit-code handling for UnhandledWorkflowError, but only run is tested (test_cli/test_run.py:988-1026). Diverging exit codes between run and resume would break CI scripts that key on exit 3 for typed halts.

jrob5756 · 2026-06-01T18:27:26Z

+            self._errors_jsonl_path = errors_path
+            # Attach the path to the exception itself so the CLI handler
+            # can render it without needing a reference to the engine.
+            e.errors_jsonl_path = errors_path  # type: ignore[attr-defined]


setattr side-channels on exceptions defeat the type system.

Two patterns here that are mechanically the same problem:

e.errors_jsonl_path = errors_path # type: ignore[attr-defined] on UnhandledWorkflowError (this site), read back via getattr(error, "errors_jsonl_path", None) in cli/app.py:190.

exc._envelope = normalized # type: ignore[attr-defined] on ExecutionError at workflow.py:3758 and :4214, read back via getattr(result, "_envelope", None) at workflow.py:3853, 3909, 4310, 4343.

Both are undeclared attributes that the type checker can't see. A rename or typo in any of the 4 read sites silently degrades to "no envelope" / "no log path" with no test failure or type error.

Mechanical fix: add typed constructor params:

UnhandledWorkflowError(..., errors_jsonl_path: Path | None = None)

ExecutionError(..., envelope: ErrorEnvelope | None = None) (importable under TYPE_CHECKING to avoid the cycle, since errors.py is a leaf module)

This removes both # type: ignore[attr-defined] writes and all six defensive getattr(..., None) reads, and makes the carriers checkable end-to-end.

Related (same theme but lower priority): the executor output fields ScriptOutput.error and AgentOutput.error are typed dict[str, Any] instead of ErrorEnvelope | None, which is what forces the cast("dict[str, Any]", envelope) calls in executor/script.py:242,244,251 and executor/agent.py:279,301. Tightening those two fields under TYPE_CHECKING would eliminate every cast introduced by this PR.

jrob5756 · 2026-06-01T18:27:26Z

+                f.write(json.dumps(record, default=str))
+                f.write("\n")
+        except OSError as e:
+            logger.warning("Failed to write errors.jsonl at %s: %s", path, e)


logger.warning here is effectively invisible — route through the event system.

Per project convention, conductor never calls logging.basicConfig/addHandler; every module just does logging.getLogger(__name__). A logger.warning with no configured handler falls through to Python's lastResort handler, which writes to raw stderr at WARNING level and bypasses the Rich console, the event stream, and the --verbose flag entirely. In --web-bg it only lands in the captured .bg.stderr.log, not the dashboard or the .events.jsonl.

This is the worst possible site for that pattern: _write_errors_jsonl is the forensic artifact that pairs with .events.jsonl for post-mortem of an unhandled halt. When the write fails, the operator gets errors_jsonl_path = None rendered as Halt log: None in the CLI panel and no indication that writing actually failed (vs. simply not being attempted).

Fix: emit a WorkflowEvent (e.g. errors_jsonl_write_failed), or attach the failure reason to the existing workflow_failed event as a sibling field — either way it then renders through the same channel as the rest of the engine's diagnostics. The same concern applies to the other logger.warning sites in this file (:2026, :3334, :3376), but this one is on the error-routing critical path.

Daniel Green and others added 14 commits May 21, 2026 14:13

This was referenced May 28, 2026

Phase 2: on_error: retrofit — restore the 19 gates removed in #535 PolyphonyRequiem/polyphony#536

Open

feat(#528): remove 19 trivial error gates across 6 workflows PolyphonyRequiem/polyphony#535

Draft

PolyphonyRequiem mentioned this pull request May 28, 2026

feat: notify_then_route — error observability for all on_error: abort paths PolyphonyRequiem/polyphony#543

Open

PolyphonyRequiem mentioned this pull request May 31, 2026

[RFC] retry: route action — Phase 2 of error-routing (companion to #229) #236

Draft

jrob5756 reviewed Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Phase 1 of typed on_error routing (#227)#229

feat: Phase 1 of typed on_error routing (#227)#229
PolyphonyRequiem wants to merge 14 commits into
microsoft:mainfrom
PolyphonyRequiem:feature/error-routing

PolyphonyRequiem commented May 21, 2026

Uh oh!

codecov-commenter commented May 21, 2026

Uh oh!

PolyphonyRequiem commented May 28, 2026

Uh oh!

PolyphonyRequiem commented May 28, 2026

Uh oh!

PolyphonyRequiem commented May 28, 2026

Uh oh!

jrob5756 left a comment

Uh oh!

jrob5756 Jun 1, 2026

Uh oh!

jrob5756 Jun 1, 2026

Uh oh!

jrob5756 Jun 1, 2026

Uh oh!

jrob5756 Jun 1, 2026

Uh oh!

jrob5756 Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		_LEAF_TYPES_THAT_RAISE: frozenset[str \| None] = frozenset({"agent", "script", None})


		def _validate_on_error_routes(agent: Any) -> list[str]:

Uh oh!

Conversation

PolyphonyRequiem commented May 21, 2026

What ships in this PR (Phase 1 scope from the RFC)

What''s reserved for Phase 2/3 (out of scope here)

Reserved kinds emitted in Phase 1

How to read this PR

Examples

Test posture

Open Phase 1 micro-decisions (resolved in this PR)

Uh oh!

codecov-commenter commented May 21, 2026

Codecov Report

Uh oh!

PolyphonyRequiem commented May 28, 2026

Uh oh!

PolyphonyRequiem commented May 28, 2026

Uh oh!

PolyphonyRequiem commented May 28, 2026

Uh oh!

jrob5756 left a comment

Choose a reason for hiding this comment

1. The script transport is good — let's just document the asymmetry.

2. Trim the bundled multi-language helpers.

Uh oh!

jrob5756 Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

jrob5756 Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

jrob5756 Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

jrob5756 Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

jrob5756 Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants