refactor(agent): runAgent snapshot harness + 3 helper extractions by kelsonpw · Pull Request #845 · amplitude/wizard

kelsonpw · 2026-05-17T19:57:42Z

Summary

Phase 1 — snapshot harness (new file). Adds src/lib/__tests__/run-agent-harness.test.ts with 8 scenario-driven inline-snapshot tests pinning the observable shape of an end-to-end runAgent call (happy path, multi-turn, [STATUS] markers, compaction recovery, auth retry storm, mid-stream error, rate-limit, post-success SDK cleanup). The harness captures the structured result, spinner sequence, UI methods touched, SDK iterator construction count, and analytics event count — making any extraction's perturbation visible as a snapshot diff.
Phase 2 — three safe extractions. With the harness as a gate, three inline closures inside runAgent are hoisted to module scope so they can be unit-tested in isolation:
- createStreamPillEmitter(spinner) — throttled stream-delta → status pill emitter (was 5 closures + 4 local vars inline)
- drainPriorResponse(prior) — best-effort SDK Query iterator cleanup (issue Stream closed race: outer-loop retry torn down while prior query's hook bridge is still in flight #297)
- logSuppressedHookBridgeNoise(count, attempt) — single-line stderr-suppression summary
Output stability. All 323 tests in agent-interface.test.ts + the new run-agent-harness.test.ts remain byte-identical green. No SDK call shape changed; no outer-function variable removed; no observable side effect reordered.

Numbers

runAgent body: 2580 → 2507 lines (-73)
agent-interface.ts: 5011 → 5066 lines (+55, due to JSDoc on extracted helpers — the inline closures lacked equivalent docs).
Tests: 315 → 323 (+8) in src/lib/__tests__/.

Discipline

Followed the brief literally:

Phase 1 first. The harness landed as its own commit (a175e701) and was verified green against pristine runAgent.
One extraction at a time. After each, the harness ran. No snapshot ever diffed; if one had, that extraction would have been reverted.
Stopped at 3. The brief says "Stop at 2-3 successful extractions. Don't drain the function." Marginal value drops past 3; many remaining inline closures bind to multiple pieces of outer mutable state (startTime, collectedText, lastResultMessage, attemptCount, upstreamGatewayFailures, authErrorSubkind, lastActivityAt, recentStatuses, …) and would require a state-holder pattern that's out of scope here.

Patterns identified but deferred

completeWithSuccess — closes over 7+ outer vars (startTime, collectedText, lastResultMessage, lastParsedEventPlan, middleware, spinner, …). Would need a state-holder object.
recordTerminal / terminalState — mutated from 5 sites, read from 7. Same problem.
heartbeatInterval callback — references 4 outer vars (recentStatuses, startTime, attemptCount, lastActivityAt). Tractable via getter functions but observable side effects make it risky without expanding the harness.
onActivity — small but observable; uses getUI().resetStallStatus.

These are good candidates for a follow-up PR once the harness has caught real regressions on this baseline and the surrounding code has settled.

Test plan

pnpm exec vitest run src/lib/__tests__/run-agent-harness.test.ts — 8/8 pass
pnpm exec vitest run src/lib/__tests__/agent-interface.test.ts — 315/315 pass
pnpm test — 4481/4481 across 299 files pass
pnpm exec tsc --noEmit — clean
pnpm lint — clean
src/utils/wizard-abort.ts untouched
src/lib/agent-runner.ts untouched
Other helpers in src/lib/agent-interface.ts outside runAgent untouched

🤖 Generated with Claude Code

Note

Medium Risk
Refactors core runAgent control-flow by hoisting streaming and retry/cleanup helpers to module scope; behavior should be unchanged but regressions could affect agent execution, UI status updates, or retry cleanup.

Overview
Adds a new run-agent-harness.test.ts snapshot-style integration harness that scripts SDK message streams and asserts the observable shape of runAgent runs (returned result normalization, spinner/status side effects, UI methods touched, SDK query retry count, and analytics event count) across 8 canonical scenarios.

Refactors runAgent by extracting three previously inline closures into exported, testable helpers: createStreamPillEmitter (throttled stream-delta→status pill with JSON-leak guard), drainPriorResponse (best-effort .return() cleanup of prior SDK iterators to avoid hook-bridge race noise), and logSuppressedHookBridgeNoise (single-line summary of suppressed stderr noise).

^{Reviewed by Cursor Bugbot for commit 2e26678. Bugbot is set up for automated code reviews on this repo. Configure here.}

Adds src/lib/__tests__/run-agent-harness.test.ts — 8 scenario-driven inline-snapshot tests that pin the observable shape of an end-to-end runAgent call: 1. happy_path_single_turn — init + success result 2. happy_path_multi_turn — assistant text + tool use + result 3. legacy_status_marker — [STATUS] markers flow to spinner 4. compaction_recovery — compact_boundary + post-compact success 5. auth_retry_storm_aborts_early — AUTH_RETRY_LIMIT 401s → early abort 6. mid_stream_error_no_success — SDK throws before any success result 7. error_result_rate_limit — is_error result with 429 → RATE_LIMIT 8. sdk_cleanup_after_success — success result then SDK throws → success Captures the structured result, spinner sequence, UI methods invoked, SDK iterator construction count, and analytics event count per scenario. Intent is not to add behavioural coverage (that lives in agent-interface.test.ts — 315 tests) but to make future refactors safe: any extraction that perturbs observable output will diff the inline snapshot and fail loudly. The harness pattern mirrors the runAgentWizardBody extraction series that landed previously (snapshot harness + extractions). This commit ships the harness alone so the extractions in the follow-up commit can be verified against a stable baseline.

Extracts three self-contained closures from runAgent (src/lib/agent-interface.ts) to module scope so they can be unit-tested in isolation: 1. createStreamPillEmitter(spinner) — wraps the throttled stream-delta → status pill emitter previously defined inline (streamPillBuffer, streamPillTimer, flushStreamPill, enqueueStreamDelta, resetStreamPill). The closures touched no outer-function mutable state besides 'spinner', which is now a parameter. 2. drainPriorResponse(prior) — best-effort cleanup of an SDK Query iterator before discarding it. Pure function with one parameter; references only logToFile from module scope. 3. logSuppressedHookBridgeNoise(count, attempt) — emits a single-line summary if the stderr filter swallowed any hook-bridge-race lines during an attempt. Pure function with two parameters; references only logToFile. runAgent body: 2580 → 2507 lines (-73). agent-interface.ts: 5011 → 5066 lines (+55, due to JSDoc on extracted helpers — the inline closures lacked equivalent docs). All 323 tests in agent-interface.test.ts + run-agent-harness.test.ts remain byte-identical green, including the inline-snapshot scenarios. No SDK call shape changed. No outer-function variable removed. The snapshot harness committed in the prior commit was the gate: each extraction was followed by a harness run, and any snapshot diff would have triggered a revert.

kelsonpw requested a review from a team as a code owner May 17, 2026 19:57

kelsonpw removed the request for review from a team May 18, 2026 18:03

kelsonpw added 2 commits May 22, 2026 11:04

kelsonpw force-pushed the refactor/run-agent-pass-1 branch from a6104a6 to 2e26678 Compare May 22, 2026 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(agent): runAgent snapshot harness + 3 helper extractions#845

refactor(agent): runAgent snapshot harness + 3 helper extractions#845
kelsonpw wants to merge 2 commits into
mainfrom
refactor/run-agent-pass-1

kelsonpw commented May 17, 2026 •

edited by cursor Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kelsonpw commented May 17, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Numbers

Discipline

Patterns identified but deferred

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kelsonpw commented May 17, 2026 •

edited by cursor Bot

Loading