Skip to content

feat: Phase 1 of typed on_error routing (#227)#229

Draft
PolyphonyRequiem wants to merge 14 commits into
microsoft:mainfrom
PolyphonyRequiem:feature/error-routing
Draft

feat: Phase 1 of typed on_error routing (#227)#229
PolyphonyRequiem wants to merge 14 commits into
microsoft:mainfrom
PolyphonyRequiem:feature/error-routing

Conversation

@PolyphonyRequiem

Copy link
Copy Markdown
Member

Phase 1 of the typed on_error routing design from #227 (RFC).

Companion PR to #227 (brainstorm spec). Spec excerpts inline below;
full design and open questions live in the RFC.

What ships in this PR (Phase 1 scope from the RFC)

  1. Script env-var-file contractCONDUCTOR_ERROR_OUT points at a temp
    file the script writes a typed envelope into; engine reads on exit. Works
    uniformly for pwsh, bash, python, node, dotnet (dotnet run --
    style).
  2. Agent JSON discriminator{conductor_error: true, kind, message, details}
    in agent output is treated as a raise rather than a regular response.
  3. RouteDef.on_error: bool | str | list[str] — routes can match on
    the failing node''s typed error kind.
  4. {{ failing_node.error }} template scope — handler nodes get the
    envelope of the node that raised, alongside its (possibly partial) output.
  5. Halt-on-unhandled at root — if no error route matches, the workflow
    halts; engine writes errors.jsonl (TMPDIR pattern, alongside the event
    log) and emits a typed workflow_failed event. CLI maps the exception to
    new exit code 3.
  6. Optional AgentDef.raises: list[str] — declared kinds are linted
    against route on_error declarations at validation time and checked at
    runtime; undeclared kinds are wrapped as internal.undeclared_kind.
  7. Engine-agnostic helpers under src/conductor/helpers/error/ for
    pwsh / bash / python / node / dotnet (ship in the wheel).

What''s reserved for Phase 2/3 (out of scope here)

  • Sub-workflow envelope propagation across the boundary.
  • retry / halt / propagate route actions.
  • Provider-exhaustion synthetic kind.
  • on_error on routes from type: workflow, human_gate, notification,
    and parallel/for_each groups is currently a hard validation error
    (avoids silent "handler that never fires" footguns). These will become
    valid in Phase 2.

Reserved kinds emitted in Phase 1

  • internal.script_error — script exited non-zero AND wrote no envelope
    (opt-in: only synthesized when the node has raises or any on_error
    route, so legacy exit_code-routing workflows are unaffected).
  • internal.schema_violation — agent output failed its output: schema.
  • internal.undeclared_kind — node with raises: raised a kind not in
    its list; original kind preserved under details.original_kind.

Reserved kind prefixes (validator forbids users declaring these):
internal., provider., subworkflow., retry..

How to read this PR

The 14 commits map 1:1 to the implementation steps from the plan and are
designed to be reviewed in order. Each is independently testable and
green:

# Commit Touch points
1 feat(schema) config/schema.py
2 feat(engine) ErrorEnvelope types engine/errors.py, exceptions.py
3 feat(router) success vs. error buckets engine/router.py
4 feat(context) store_error + .error access engine/context.py
5 feat(validator) cross-check config/validator.py
6 feat(agent-exec) envelope path executor/agent.py
7 feat(script-exec) CONDUCTOR_ERROR_OUT executor/script.py
8 feat(engine-wire) leaf-error path engine/workflow.py
9 feat(halt-jsonl) errors.jsonl + event engine/workflow.py
10 feat(cli-exit) exit code 3 cli/app.py
11 feat(helpers) 5 language helpers helpers/error/*
12 feat(examples) examples/error-routing*.yaml
13 test(xeng) cross-engine envelope contract tests/test_integration/
14 phase-1(checks) final ty/lint/test sweep minor casts

Examples

  • examples/error-routing.yaml — script-based; uses the
    CONDUCTOR_ERROR_OUT contract directly with no helper. Workflow input
    simulated_failure toggles ok / drift / rate_limited.
  • examples/error-routing-helpers.yaml — same shape using the shipped
    Python helper.

Both validate, both run on Windows and POSIX, both render
{{ failing_node.error.kind }} / .message / .details in the
handler''s prompt.

Test posture

  • 2943 tests pass; 12 baseline failures (perf flake, event_log
    non-serializable, registry/integration ×10) are pre-existing and
    unchanged.
  • New tests: 8 engine error-routing tests, 6 helper tests, 2 CLI
    exit-code tests, 3 cross-engine integration tests (Python + pwsh run
    on Windows; bash skipped on Windows by design — WSL relay shim
    unreliable in CI envs).
  • Lint clean (ruff check), format clean (ruff format --check).
  • ty check src back to the 12-diagnostic baseline (all pre-existing
    Windows termios/tty noise).
  • make validate-examples equivalent green across all 17 bundled
    examples including the two new ones.

Open Phase 1 micro-decisions (resolved in this PR)

  • Exit code value: 3 (1 = generic, 2 = reserved misuse, 3 = workflow
    halted on unhandled error).
  • errors.jsonl path: same $TMPDIR/conductor/ convention as the event
    log; printed at end-of-run.
  • Frame trail in errors.jsonl: single-element today, structured to
    accept multi-frame in Phase 2 without a shape change.
  • Convenience {{ workflow.last_error }} not added (RFC says require
    {{ failing_node.error }}); can revisit in Phase 2.

cc @jasonrobertfox — companion to the RFC at #227.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Daniel Green and others added 14 commits May 21, 2026 14:13
…ting

Phase 1, step 1 of on_error routing (per design brief in
docs/projects/error-routing/on-error-routing.brainstorm.md).

Adds two opt-in schema fields and a small shared module for the
constants both schema validation and the engine error path will use:

- src/conductor/error_kinds.py
  - KIND_PATTERN: dotted lowercase identifier (at least one dot).
  - RESERVED_KIND_PREFIXES: internal., provider., subworkflow., retry.
    (the runtime owns these namespaces; workflow authors cannot declare
    kinds under them).
  - RESERVED_ON_ERROR_ALLOWLIST: the closed set of runtime-synthesized
    kinds that ARE legal to match in on_error even though they're not
    legal to declare in raises (internal.script_error,
    internal.schema_violation, internal.undeclared_kind).
  - is_reserved_prefix(kind) helper.

- RouteDef.on_error: bool | str | list[str] | None
  - None = success route (existing behavior).
  - True = catch-all error route.
  - str  = single-kind error route.
  - list[str] = multi-kind error route.
  - False is rejected (no semantic meaning).
  - Kind format enforced via KIND_PATTERN.
  - 'before'-mode validator so Pydantic's bool/str coercion doesn't
    swallow the discriminator.

- AgentDef.raises: list[str] | None
  - Optional declaration of kinds the node may raise.
  - Powers a load-time lint (cross-checked against routes' on_error in
    the validator, landing in a follow-up commit) and a runtime
    undeclared-kind check (will land with the engine-wiring commit).
  - Reserved prefixes rejected so authors can't claim runtime
    namespaces; duplicates rejected; format enforced via KIND_PATTERN.

Tests:
- tests/test_error_kinds.py — 24 cases covering pattern + prefix +
  allowlist invariants (allowlist entries must themselves be reserved).
- tests/test_config/test_schema.py::TestRouteDefOnError — 14 cases.
- tests/test_config/test_schema.py::TestAgentDefRaises — 10 cases.

No semantics change for existing workflows: both fields default to None
and the engine doesn't observe them yet (wiring lands in subsequent
commits).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…exceptions

Phase 1, step 2 of on_error routing.

Adds src/conductor/engine/errors.py:

- ErrorEnvelope TypedDict — the internal {kind, message, details}
  shape. Strips the on-the-wire conductor_error: true discriminator so
  callers don't see it in {{ failing_node.error.* }} templates.
- EnvelopeValidationError — distinct from ValidationError so the
  engine can catch and translate malformed envelopes into synthetic
  internal.* kinds rather than halting with a generic config error.
- coerce_envelope(raw) — validates on-the-wire input, normalizes
  details to {} when absent.
- make_script_error(exit_code, stderr_tail, command) — synthesizes
  the internal.script_error envelope.
- make_schema_violation(node_name, source, original_message,
  failed_field?) — synthesizes the internal.schema_violation envelope
  with rich details for the swallowed-by-catch-all diagnostics case.
- wrap_undeclared_kind(original, declared) — wraps an envelope whose
  kind isn't in the node's declared raises list. Preserves the
  original kind/message/details under details.original_* so an author
  handling internal.undeclared_kind can still recover the intent.

Adds two exceptions to src/conductor/exceptions.py:

- UnhandledNodeError — internal signal raised by the router when an
  error envelope reaches no on_error route at the current level. The
  engine catches this at the per-node dispatch site and re-raises as
  UnhandledWorkflowError. Not intended to surface to end users.
- UnhandledWorkflowError — workflow halted on a typed error envelope.
  Carries the envelope and a frame trail (single frame in Phase 1;
  Phase 2 will accumulate frames across sub-workflow boundaries).
  CLI maps this to a distinct exit code so callers can distinguish
  'workflow ran and halted on typed error' from generic failures.

Tests: tests/test_engine/test_errors.py — 18 cases covering envelope
coercion (including discriminator stripping and details normalization),
the three synthetic-kind constructors, and the exception classes
including the empty-frames defensive path.

Nothing yet emits these envelopes or exceptions; the next commits wire
them through the router, executors, and engine dispatch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Route evaluation now partitions by RouteDef.on_error:

- on_error is None  -> success bucket, evaluated when error=None

- on_error is set   -> error bucket, evaluated when an envelope is

  passed; the route's on_error matcher (True | str | list[str])

  must match envelope[kind]

Behavior preserved on the success path: first matching when: wins, no

catch-all raises the existing ValueError. New: error-path exhaustion

raises UnhandledNodeError carrying the envelope so the engine can

translate it into UnhandledWorkflowError at the call site.

Error-route eval context exposes the envelope as `error` for both

Jinja2 ({{ error.kind }}) and simpleeval (kind == 'x.y' via flatten).

Adds 12 tests in TestRouterErrorBucket covering bucket isolation,

all three on_error matcher shapes, when: combined with on_error,

output: transforms, ordering within the bucket, the new

UnhandledNodeError path, and the legacy ValueError path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds WorkflowContext.store_error(agent_name, envelope) that co-locates

error envelopes with their producing node's slot. The rendered context

shape for a node is now `{node: {output?, error?}}`.

All three context modes surface errors:

- accumulate: each failing node gets `{agent: {error: envelope}}`

- last_only:  failing last agent surfaces with `{error: ...}`

- explicit:   declarations of the form `agent.error[.field]` copy

  the whole envelope into ctx[agent]['error']. Envelopes are bounded

  and templates commonly need `error.details.*`, so the runtime

  never field-slices them.

Validator updates so existing semantic checks cover the new path:

- INPUT_REF_PATTERN gains an `error_agent`/`error_field` branch

  matching `<agent>.error[.field]`.

- _OUTPUT_ATTRS includes the singular `error` so Jinja AST analysis

  treats `{{ failing.error.kind }}` as a real output-class ref.

- TemplateRefs gains `agent_error_refs: set[str]` and the AST

  walker populates it.

- The per-agent template walk emits the same explicit-mode

  `undeclared input` warning for `.error` refs that `.output`

  and group `.errors` already get.

- Unknown-agent checks cover the `.error` ref path.

- Parallel-group internal-dependency check rejects intra-group

  `.error` refs too.

Checkpoint round-trip via to_dict/from_dict serializes `agent_errors`;

older checkpoints without the key restore as empty (backwards-compat).

Adds 14 tests in TestWorkflowContextStoreError, 5 INPUT_REF_PATTERN

shape tests, and 3 TemplateRefs error-extraction tests. Fixes the

test_empty_context dict-equality fixture to include the new field.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…type

Adds two new helpers in conductor.config.validator:

- _validate_on_error_routes(agent): hard-errors on_error routes on node types that don't raise envelopes in Phase 1 (human_gate, workflow); validates each kind matches KIND_PATTERN; if agent.raises is declared, every concrete kind in on_error must be in raises or RESERVED_ON_ERROR_ALLOWLIST (catch-all 	rue always legal).

- _validate_group_routes_no_on_error(): rejects on_error routes on parallel and for_each groups (group-level envelopes are Phase 2).

Both helpers are wired into validate_workflow_config(). 11 new tests cover plain agent + script (legal), human_gate + workflow + parallel + for_each (rejected), bad kind format, undeclared-kind cross-check, catch-all, reserved allowlist, and no-raises = no constraint.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ions

AgentOutput grows an optional `error: dict | None` field that carries an ErrorEnvelope when the agent failed.

AgentExecutor.execute() now, after the provider call returns:

1. If the response is a dict with `conductor_error: true`, coerces the envelope (or synthesizes `internal.schema_violation` when the envelope itself is malformed) and attaches it to output.error WITHOUT running validate_output (the declared output schema doesn't apply to error envelopes).

2. Otherwise runs validate_output, and on ValidationError synthesizes an `internal.schema_violation` envelope on output.error instead of raising.

Partial outputs (from mid-agent interrupts) bypass both checks.

The error module is imported lazily inside execute() to avoid a circular import via conductor.engine.__init__.

Updated two existing tests to assert the new envelope contract instead of expecting ValidationError. Added three new tests covering well-formed envelopes, malformed envelopes, and the partial-output bypass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nthesize internal.script_error

ScriptOutput grows an optional `error: dict | None` field.

Each script run now allocates a tempfile via tempfile.mkstemp(), closes the fd immediately, and exposes the path in the env as CONDUCTOR_ERROR_OUT — set AFTER the agent.env merge so users cannot accidentally redirect or override it.

After process.communicate() the executor reads the file:

- empty / missing → no envelope

- valid JSON envelope → coerce_envelope, attach to output.error

- valid JSON but malformed envelope → internal.schema_violation

- invalid JSON → internal.schema_violation

If no envelope was written AND exit_code != 0 AND the node has opted into error routing (raises declared OR any on_error route present), the executor synthesizes an `internal.script_error` envelope. Legacy workflows that route on exit_code (no opt-in) keep their existing behavior.

Temp file is always removed in finally — even on timeout/command-not-found.

Added 7 tests covering: no envelope on success, well-formed envelope surfaces, user env cannot override, synthesized internal.script_error on opt-in, legacy non-zero with no opt-in keeps error=None, malformed envelope downgrades to schema_violation, temp file cleanup.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ary, group failure carry envelope

Wires the on_error contract through the workflow engine end-to-end:

- _evaluate_routes() now accepts an optional error envelope; empty routes plus an unhandled envelope raises UnhandledNodeError instead of silently routing to \.

- New _normalize_envelope_for_node() applies undeclared-kind wrapping (raises declared + kind not in raises + not in allowlist → internal.undeclared_kind with original kind preserved under details.original_kind).

- New _handle_leaf_error() centralizes the leaf path: normalize, store_error, evaluate error routes, raise UnhandledWorkflowError with a single-leaf frame trail on no match.

- Agent call site (~2583) and script call site (~2359) both branch on output.error BEFORE storage and BEFORE schema validation. Script's success path runs full output-schema validation as before; error path skips it.

- Sub-workflow call sites catch UnhandledWorkflowError from child engines and re-raise as ExecutionError. Phase 1 invariant: envelopes do not propagate across the sub-workflow boundary.

- ParallelAgentError and ForEachError grow an optional envelope field. The parallel/for_each child execution helpers detect output.error, normalize it, raise an ExecutionError tagged with ._envelope, and the existing failure_mode machinery records it. Downstream group consumers can inspect the typed envelope.

- Tests: 6 new tests in tests/test_engine/test_error_routing.py covering agent envelope routing, unhandled-envelope halt, undeclared-kind normalization (with the rescue agent reading the original kind from context), success-path regression, script envelope routing, and legacy exit_code routing regression. Full suite (2887 tests) green; no regressions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…on UnhandledWorkflowError

Phase 1 Step 9. When the engine raises UnhandledWorkflowError (a leaf
node returned an error envelope that no on_error route matched), the
new `except UnhandledWorkflowError` arm in `_execute_loop`:

* writes a single-line `errors.jsonl` record under \\\/conductor/\\
  using the same naming convention as the event log
  (\\conductor-<workflow>-<ts>-<run_id>.errors.jsonl\\), carrying the
  envelope, frame trail, and leaf node name;
* emits a typed \\workflow_failed\\ event with
  \\�rror_type='UnhandledWorkflowError'\\ plus the envelope, frames
  and errors-jsonl path so dashboard/log subscribers can render the
  typed halt distinctly from a generic ConductorError;
* runs the \\on_error\\ lifecycle hook and saves a checkpoint, for
  parity with the other failure paths;
* re-raises so the CLI (Step 10) can map it to its distinct exit code.

The new arm is placed *before* the existing \\�xcept ConductorError\\
so the typed halt is caught first. Generic ConductorError handling is
unchanged.

Adds two integration tests covering the jsonl artefact and the typed
event. Also tightens a docstring that was 101 chars (pre-existing from
Step 8 — caught now by ruff format check).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…stderr summary

Phase 1 Step 10. Both \\
un\\ and \\
esume\\ now catch
\\UnhandledWorkflowError\\ specifically before the generic
\\�xcept Exception\\, render a typed panel to stderr via the new
\\print_unhandled_workflow_error\\ helper, and exit with code 3 so
callers (CI, polyphony, shell scripts) can distinguish 'workflow ran
to completion and halted on a typed error' from a generic failure
(code 1).

The summary panel surfaces the leaf node, kind, message, optional
\\details\\, and the path to the \\�rrors.jsonl\\ artefact. To make
the path reachable from the CLI without holding a reference to the
engine, \\_execute_loop\\ now also attaches the path to the exception
instance (\\�xc.errors_jsonl_path\\) before re-raising.

Adds two CLI tests:

* unhandled typed halt exits 3,
* generic \\RuntimeError\\ still exits 1 (regression guard).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n/node/dotnet

Phase 1 Step 11. New \\src/conductor/helpers/error/\\ directory with
one-file convenience modules for raising typed Conductor error
envelopes from script-type nodes. All helpers implement the same
contract: read \\CONDUCTOR_ERROR_OUT\\, write the
\\{conductor_error: true, kind, message, details?}\\ JSON envelope to
that path, and return — leaving exit-code management to the caller.

* \\Conductor.Error.psm1\\ — PowerShell, \\Write-ConductorError\\ cmdlet
* \\conductor-error.sh\\ — Bash/sh, sourced \\conductor_error\\ function
* \\conductor_error.py\\ — Python, \\
aise_kind\\ function
* \\conductor-error.mjs\\ — Node, exported \\
aiseError\\
* \\ConductorError.cs\\ — .NET, static \\ConductorError.Raise\\
* \\README.md\\ — quick reference + usage examples per engine

Helpers ship under \\src/conductor/\\ so hatchling rolls them into
the wheel; verified by inspecting the built artefact. Nothing is
auto-loaded, on PATH, or on PYTHONPATH — script authors must
explicitly Import-Module / source / import to use them, and authors
who don't want them write the JSON themselves (it's three lines per
engine).

Adds \\	ests/test_helpers/test_error_helpers.py\\ with 6 cases
covering the Python helper's envelope shape, env-var contract, no-
sys.exit guarantee, and round-trip through \\coerce_envelope\\. The
non-Python helpers are exercised by the cross-engine integration test
landing in Step 13.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 1 Step 12. Two new example workflows under \\�xamples/\\ plus
an Error Routing section in \\�xamples/README.md\\.

\\�rror-routing.yaml\\:

* A \\	ype: script\\ probe that writes \\{conductor_error: true,
  kind, message, details}\\ to \\\\\ and exits 0
  via the language-neutral contract (no helper required).
* The probe declares \\
aises: [external.git.drift,
  external.api.rate_limited]\\ so undeclared kinds get normalised to
  \\internal.undeclared_kind\\.
* Routes select by \\on_error: <kind>\\ to demonstrate that an
  envelope picks the typed arm over a generic exit_code fallback.
* Handlers read \\{{ probe.error.kind }}\\,
  \\{{ probe.error.message }}\\, and \\{{ probe.error.details.* }}\\.
* A \\simulated_failure\\ workflow input toggles between
  \\ok\\ / \\drift\\ / \\
ate_limited\\ so the same YAML exercises
  all three arms.

\\�rror-routing-helpers.yaml\\:

* Same flow, but raises via the shipped Python helper
  (\\conductor.helpers.error.conductor_error.raise_kind\\) instead of
  hand-rolled JSON, so authors see the ergonomic version.

Both examples validate with \\conductor validate\\ and the full
\\make validate-examples\\ sweep (17/17 pass). Caught one writing
issue along the way: the input schema field is \\input:\\ (singular),
not \\inputs:\\ — fixed before committing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 1 Step 13 (acceptance #1). New
\\	ests/test_integration/test_error_routing_cross_engine.py\\
exercises the \\CONDUCTOR_ERROR_OUT\\ contract end-to-end through the
real \\WorkflowEngine\\ (no executor mocking) with three writer
scripts in different languages: Python, PowerShell (\\pwsh\\), and
bash. All three share the same workflow YAML shape and the same
expected outcome: the probe writes a typed envelope, the engine routes
by \\on_error: external.git.drift\\, and a rescue agent reads the
kind back from the routed-to scope.

Each test is skipped when the corresponding interpreter is missing
from PATH (and bash is also skipped on Windows, where \\shutil.which\\
typically returns the WSL relay shim that fails with an opaque
\\�xecvpe(/bin/bash) failed: No such file or directory\\ — outside
the scope of this contract test; the brief calls for bash-on-Linux
specifically).

Locally on Windows: 2 pass (python + pwsh), 1 skipped (bash). On a
Linux CI runner with pwsh installed, all three execute. The contract
is the same string of bytes in every engine, so identical assertions
hold across them — which is the whole point.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ty's TypedDict to dict() conversion picks the wrong overload

for our ErrorEnvelope, surfacing 8 spurious diagnostics.

Use cast() at the leaf sites where TypedDict envelopes cross

into APIs typed as dict[str, Any] (script/agent executor returns;

workflow.py store_error and router.evaluate call sites).

Brings ty count back to the 12-diagnostic baseline (all pre-existing

Windows termios/tty noise). Lint, format, examples-validation, and

full test suite (2943 pass, 12 baseline failures) all green.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 86.85832% with 64 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@efa520f). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/conductor/engine/workflow.py 75.47% 26 Missing ⚠️
src/conductor/config/validator.py 83.82% 11 Missing ⚠️
src/conductor/engine/context.py 80.00% 9 Missing ⚠️
src/conductor/executor/script.py 87.87% 8 Missing ⚠️
src/conductor/cli/app.py 81.81% 6 Missing ⚠️
src/conductor/config/schema.py 93.87% 3 Missing ⚠️
src/conductor/engine/router.py 96.42% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #229   +/-   ##
=======================================
  Coverage        ?   88.20%           
=======================================
  Files           ?       65           
  Lines           ?    10115           
  Branches        ?        0           
=======================================
  Hits            ?     8922           
  Misses          ?     1193           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@PolyphonyRequiem

Copy link
Copy Markdown
Member Author

Attempted to rebase feature/error-routing onto v0.1.18 for local polyphony dogfood. Two mechanical conflicts at commit 1/14 (schema.py: import block + adjacent validator) resolved cleanly. Hit a semantic conflict at commit 3/14 (context.py / _add_agent_input) and aborted — not pushed.

Conflict summary: v0.1.18's set-step PR initializes output context as {"output": {} if is_dict_output else None} — the None seed is required by TestWorkflowContextNonDictOutputs::test_explicit_mode_scalar_field_optional_skips, which asserts rendered["compute"]["output"] is None. This branch's commit 17c7cc7 changes the same method to agent_outputs.get(agent_name) + simplified {"output": {}} init, which conflates 'agent never ran' with 'agent produced None output'. The right fix is likely a _MISSING sentinel to distinguish the two cases — but that requires a judgment call from this PR's author.

Built the dogfood from efa520f (pre-v0.1.18 base) where the conflict doesn't exist. All 61 error-routing tests pass in the dogfood.

@PolyphonyRequiem

Copy link
Copy Markdown
Member Author

Tried rebasing this onto v0.1.18 base for polyphony dogfood integration, and hit a design conflict in src/conductor/runtime/context.py.

The clash: PR #229 commit 17c7cc7 uses agent_outputs.get() with a {"output": {}} default to treat missing agent output uniformly with None output. But v0.1.18 (set-step PR) introduces is_dict_output flag tracking + None seed init for non-dict outputs—required by TestWorkflowContextNonDictOutputs. These two patterns conflate "agent never ran" with "agent produced None output."

The _MISSING sentinel pattern would resolve it cleanly, but I'm curious how you intended this to reconcile with the v0.1.18 init semantics. Happy to follow whichever direction you'd prefer.

Interim: Polyphony dogfood is staying pinned at v0.1.17 base (efa520f) in the meantime—not a blocker for you, just a heads-up. We'll rebase once the upstream design call lands.

1 similar comment
@PolyphonyRequiem

Copy link
Copy Markdown
Member Author

Tried rebasing this onto v0.1.18 base for polyphony dogfood integration, and hit a design conflict in src/conductor/runtime/context.py.

The clash: PR #229 commit 17c7cc7 uses agent_outputs.get() with a {"output": {}} default to treat missing agent output uniformly with None output. But v0.1.18 (set-step PR) introduces is_dict_output flag tracking + None seed init for non-dict outputs—required by TestWorkflowContextNonDictOutputs. These two patterns conflate "agent never ran" with "agent produced None output."

The _MISSING sentinel pattern would resolve it cleanly, but I'm curious how you intended this to reconcile with the v0.1.18 init semantics. Happy to follow whichever direction you'd prefer.

Interim: Polyphony dogfood is staying pinned at v0.1.17 base (efa520f) in the meantime—not a blocker for you, just a heads-up. We'll rebase once the upstream design call lands.

@jrob5756 jrob5756 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a pass on the overall approach. I like the direction: success/error route buckets fit conductor's existing model cleanly, the opt-in synthesis keeps legacy exit_code routing untouched, and the phasing discipline (deferring subworkflow/group propagation as hard validation errors rather than silent no-ops) is the right call. Two pieces of feedback before this locks in the authoring contract:

1. The script transport is good — let's just document the asymmetry.

To be clear for others reading: CONDUCTOR_ERROR_OUT is a file path, not JSON-in-an-env-var, so escaping/size concerns don't apply — this is the $GITHUB_OUTPUT pattern and I think it's the right choice. The reason it's a file (and not a stdout discriminator like agents use) is that type: script already parses stdout as the output: contract, so errors need an out-of-band channel. That justification is sound, but it means the raise mechanism differs by node type (agents: in-band stdout conductor_error: true; scripts: out-of-band file). Please call that split out explicitly in docs/workflow-syntax.md so authors aren't surprised.

The parse-failure handling is solid — malformed JSON and malformed envelopes both downgrade to internal.schema_violation rather than dropping the signal, and reading only after communicate() returns means there's no write/read race on the non-atomic write. 👍 One thing to confirm in the correctness pass: malformed→schema_violation fires regardless of opt-in, whereas synthesized internal.script_error only fires when the node opted in. I think that's intended, but let's make sure a node that writes garbage with no on_error route halting on exit 3 (vs. legacy exit-code routing) is the behavior we want.

2. Trim the bundled multi-language helpers.

The contract itself is great precisely because it's language-neutral — "write a 3-field JSON file" — and you've kept it fully optional (the raw example uses no helper, plain echo > "$CONDUCTOR_ERROR_OUT" works). That part I'm fully on board with.

What I'd push back on is shipping conductor-error.sh, conductor-error.mjs, Conductor.Error.psm1, and ConductorError.cs inside the Python wheel. A pwsh/.NET/node user can't naturally consume a file buried in site-packages/conductor/helpers/error/ — there's no Import-Module/NuGet/npm story and the path is venv- and version-specific, so in practice these are copy-paste snippets dressed up as libraries. It's also five implementations of a trivial contract to keep in sync, and it undercuts the "it's just a JSON file" message.

Proposal: keep the Python helper (conductor is Python; inline python -c and script nodes make a real importable module genuinely useful) and demote the other four to documented copy-paste snippets in docs/ rather than shipping source in the wheel. Same ergonomic win where it's natural, without the maintenance/consumption awkwardness.

Neither of these is a structural objection — the file side-channel and the fail-safe parse handling are the right foundation. Mostly packaging + docs. Will follow up with correctness notes on the router/engine wiring.

try:
with open(path, encoding="utf-8") as f:
raw = f.read()
except OSError:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent envelope drop on read failure when exit_code == 0.

Swallowing OSError → None here is fine when the script never wrote a typed error (the file legitimately doesn't exist / is empty), but it conflates that benign case with a real read failure on a path the engine itself created. Consider this path:

  1. Script writes a valid envelope to $CONDUCTOR_ERROR_OUT (raising a typed error) and exits 0, relying on the file as the failure channel.
  2. The read here raises OSError (permissions, transient FS error, racy temp cleanup) → _read_error_envelope returns None.
  3. Back at execute() (~line 191), synthesis is gated on exit_code != 0 and _node_uses_error_routing(agent), so nothing fires.
  4. ScriptOutput.error stays None, the engine sees a success, and the node is routed down the success path — no envelope, no log, no event.

This is the exact silent-failure class typed error routing exists to prevent. An unexpected OSError on an engine-owned path should not be conflated with "script chose not to raise" — at minimum emit a diagnostic, and ideally synthesize a read-failure variant of internal.script_error / internal.schema_violation so the typed-failure signal is preserved regardless of exit code.

Side note: open(..., encoding="utf-8") defers decoding to f.read(), so a UnicodeDecodeError is not caught by except OSError here — that crashes rather than silently dropping, which is arguably the lesser evil but inconsistent.

_LEAF_TYPES_THAT_RAISE: frozenset[str | None] = frozenset({"agent", "script", None})


def _validate_on_error_routes(agent: Any) -> list[str]:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validator gap + two related correctness items it would be natural to address together here.

a) Validator gap — node with only on_error routes crashes generically on the happy path. This validator accepts a node with only error routes and no success fallback. On a successful run, Router._evaluate_success (router.py:103-121) skips every route whose on_error is not None, then raises a plain ValueError("No matching route found..."). That bubbles up as a generic exception → CLI exit 1 → no errors.jsonl, no typed halt panel, despite the workflow being clearly authored for the typed-error world. Verified locally. Recommend rejecting at validation here with a teaching message like "Agent 'X' has on_error routes but no success route. Add a route without on_error to handle the happy path."

b) Related — _evaluate_error silently ignores output_for_route (engine/router.py:138-176, called from _handle_leaf_error at engine/workflow.py:4452). The plumbing accepts output_for_route but _evaluate_error only flattens error into the eval context and drops current_output. Any error-route when: clause referencing the failing node's output (e.g. attempt_count > 3) will silently see the previous agent's output. Either merge output_for_route into the error eval context (more useful) or drop the plumbing (more honest), but the current silent inconsistency is a latent footgun.

c) Related — agent schema-violation auto-upgrades existing workflows from exit 1 → exit 3 (executor/agent.py:282-302). Pre-PR: schema violation → ValidationError → exit 1. Post-PR: synthesizes internal.schema_violation envelope → unhandled → exit 3 + errors.jsonl. This is asymmetric with the script executor's _node_uses_error_routing opt-in (script.py:53-67), which preserves legacy behavior for unopted scripts. Either gate the agent upgrade on the same opt-in (so the asymmetry is principled, not accidental), or document the exit-code change in release notes.

# path treats it like any other sub-workflow failure. Phase 2
# will introduce envelope propagation with parent frames.
raise ExecutionError(
f"sub-workflow '{agent.name}' halted on unhandled error envelope "

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing integration tests guard documented Phase-1 invariants.

Four integration boundaries have documented contracts but no tests; a regression on any of them would silently change semantics in ways unit tests won't catch.

  1. Sub-workflow downgrade (this site and the mirror at line 1294). The module docstring at tests/test_engine/test_error_routing.py:15-16 claims "Phase 1 does NOT propagate envelopes across sub-workflow boundaries" — but no test exercises either branch. Add a workflow with a type: workflow step whose child halts on an unhandled envelope; assert the parent sees ExecutionError, not UnhandledWorkflowError, and that the parent's own on_error routes do NOT fire.

  2. Parallel / for_each envelope plumbing. ParallelAgentError.envelope / ForEachError.envelope are populated in production at workflow.py:3853, 3909, 4310, 4343 from the _envelope attribute attached at :3756 and :4213. The only tests of these dataclasses (tests/test_engine/test_parallel.py:504-536) pre-date the PR and construct them in isolation. Add a test that runs a real parallel/for_each group with a typed-envelope child and asserts the envelope is preserved on the group's error.

  3. step.error input ref without producer's raises. The validator (config/validator.py:1351 walking output/outputs refs) doesn't cross-check a leaf-node producer.error reference against producer.raises. A consumer that declares inputs: ["fetch.error"] against a fetch node with no raises: passes validation and only KeyErrors at runtime via _add_agent_input in engine/context.py.

  4. resume exit-code-3 path. cli/app.py:930-933 mirrors the run exit-code handling for UnhandledWorkflowError, but only run is tested (test_cli/test_run.py:988-1026). Diverging exit codes between run and resume would break CI scripts that key on exit 3 for typed halts.

self._errors_jsonl_path = errors_path
# Attach the path to the exception itself so the CLI handler
# can render it without needing a reference to the engine.
e.errors_jsonl_path = errors_path # type: ignore[attr-defined]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setattr side-channels on exceptions defeat the type system.

Two patterns here that are mechanically the same problem:

  • e.errors_jsonl_path = errors_path # type: ignore[attr-defined] on UnhandledWorkflowError (this site), read back via getattr(error, "errors_jsonl_path", None) in cli/app.py:190.
  • exc._envelope = normalized # type: ignore[attr-defined] on ExecutionError at workflow.py:3758 and :4214, read back via getattr(result, "_envelope", None) at workflow.py:3853, 3909, 4310, 4343.

Both are undeclared attributes that the type checker can't see. A rename or typo in any of the 4 read sites silently degrades to "no envelope" / "no log path" with no test failure or type error.

Mechanical fix: add typed constructor params:

  • UnhandledWorkflowError(..., errors_jsonl_path: Path | None = None)
  • ExecutionError(..., envelope: ErrorEnvelope | None = None) (importable under TYPE_CHECKING to avoid the cycle, since errors.py is a leaf module)

This removes both # type: ignore[attr-defined] writes and all six defensive getattr(..., None) reads, and makes the carriers checkable end-to-end.

Related (same theme but lower priority): the executor output fields ScriptOutput.error and AgentOutput.error are typed dict[str, Any] instead of ErrorEnvelope | None, which is what forces the cast("dict[str, Any]", envelope) calls in executor/script.py:242,244,251 and executor/agent.py:279,301. Tightening those two fields under TYPE_CHECKING would eliminate every cast introduced by this PR.

f.write(json.dumps(record, default=str))
f.write("\n")
except OSError as e:
logger.warning("Failed to write errors.jsonl at %s: %s", path, e)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.warning here is effectively invisible — route through the event system.

Per project convention, conductor never calls logging.basicConfig/addHandler; every module just does logging.getLogger(__name__). A logger.warning with no configured handler falls through to Python's lastResort handler, which writes to raw stderr at WARNING level and bypasses the Rich console, the event stream, and the --verbose flag entirely. In --web-bg it only lands in the captured .bg.stderr.log, not the dashboard or the .events.jsonl.

This is the worst possible site for that pattern: _write_errors_jsonl is the forensic artifact that pairs with .events.jsonl for post-mortem of an unhandled halt. When the write fails, the operator gets errors_jsonl_path = None rendered as Halt log: None in the CLI panel and no indication that writing actually failed (vs. simply not being attempted).

Fix: emit a WorkflowEvent (e.g. errors_jsonl_write_failed), or attach the failure reason to the existing workflow_failed event as a sibling field — either way it then renders through the same channel as the rest of the engine's diagnostics. The same concern applies to the other logger.warning sites in this file (:2026, :3334, :3376), but this one is on the error-routing critical path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants