Skip to content

refactor(agent): runAgent snapshot harness + 3 helper extractions#845

Open
kelsonpw wants to merge 2 commits into
mainfrom
refactor/run-agent-pass-1
Open

refactor(agent): runAgent snapshot harness + 3 helper extractions#845
kelsonpw wants to merge 2 commits into
mainfrom
refactor/run-agent-pass-1

Conversation

@kelsonpw

@kelsonpw kelsonpw commented May 17, 2026

Copy link
Copy Markdown
Member

Summary

  • Phase 1 — snapshot harness (new file). Adds src/lib/__tests__/run-agent-harness.test.ts with 8 scenario-driven inline-snapshot tests pinning the observable shape of an end-to-end runAgent call (happy path, multi-turn, [STATUS] markers, compaction recovery, auth retry storm, mid-stream error, rate-limit, post-success SDK cleanup). The harness captures the structured result, spinner sequence, UI methods touched, SDK iterator construction count, and analytics event count — making any extraction's perturbation visible as a snapshot diff.
  • Phase 2 — three safe extractions. With the harness as a gate, three inline closures inside runAgent are hoisted to module scope so they can be unit-tested in isolation:
  • Output stability. All 323 tests in agent-interface.test.ts + the new run-agent-harness.test.ts remain byte-identical green. No SDK call shape changed; no outer-function variable removed; no observable side effect reordered.

Numbers

  • runAgent body: 2580 → 2507 lines (-73)
  • agent-interface.ts: 5011 → 5066 lines (+55, due to JSDoc on extracted helpers — the inline closures lacked equivalent docs).
  • Tests: 315 → 323 (+8) in src/lib/__tests__/.

Discipline

Followed the brief literally:

  1. Phase 1 first. The harness landed as its own commit (a175e701) and was verified green against pristine runAgent.
  2. One extraction at a time. After each, the harness ran. No snapshot ever diffed; if one had, that extraction would have been reverted.
  3. Stopped at 3. The brief says "Stop at 2-3 successful extractions. Don't drain the function." Marginal value drops past 3; many remaining inline closures bind to multiple pieces of outer mutable state (startTime, collectedText, lastResultMessage, attemptCount, upstreamGatewayFailures, authErrorSubkind, lastActivityAt, recentStatuses, …) and would require a state-holder pattern that's out of scope here.

Patterns identified but deferred

  • completeWithSuccess — closes over 7+ outer vars (startTime, collectedText, lastResultMessage, lastParsedEventPlan, middleware, spinner, …). Would need a state-holder object.
  • recordTerminal / terminalState — mutated from 5 sites, read from 7. Same problem.
  • heartbeatInterval callback — references 4 outer vars (recentStatuses, startTime, attemptCount, lastActivityAt). Tractable via getter functions but observable side effects make it risky without expanding the harness.
  • onActivity — small but observable; uses getUI().resetStallStatus.

These are good candidates for a follow-up PR once the harness has caught real regressions on this baseline and the surrounding code has settled.

Test plan

  • pnpm exec vitest run src/lib/__tests__/run-agent-harness.test.ts — 8/8 pass
  • pnpm exec vitest run src/lib/__tests__/agent-interface.test.ts — 315/315 pass
  • pnpm test — 4481/4481 across 299 files pass
  • pnpm exec tsc --noEmit — clean
  • pnpm lint — clean
  • src/utils/wizard-abort.ts untouched
  • src/lib/agent-runner.ts untouched
  • Other helpers in src/lib/agent-interface.ts outside runAgent untouched

🤖 Generated with Claude Code


Note

Medium Risk
Refactors core runAgent control-flow by hoisting streaming and retry/cleanup helpers to module scope; behavior should be unchanged but regressions could affect agent execution, UI status updates, or retry cleanup.

Overview
Adds a new run-agent-harness.test.ts snapshot-style integration harness that scripts SDK message streams and asserts the observable shape of runAgent runs (returned result normalization, spinner/status side effects, UI methods touched, SDK query retry count, and analytics event count) across 8 canonical scenarios.

Refactors runAgent by extracting three previously inline closures into exported, testable helpers: createStreamPillEmitter (throttled stream-delta→status pill with JSON-leak guard), drainPriorResponse (best-effort .return() cleanup of prior SDK iterators to avoid hook-bridge race noise), and logSuppressedHookBridgeNoise (single-line summary of suppressed stderr noise).

Reviewed by Cursor Bugbot for commit 2e26678. Bugbot is set up for automated code reviews on this repo. Configure here.

@kelsonpw kelsonpw requested a review from a team as a code owner May 17, 2026 19:57
@kelsonpw kelsonpw removed the request for review from a team May 18, 2026 18:03
kelsonpw added 2 commits May 22, 2026 11:04
Adds src/lib/__tests__/run-agent-harness.test.ts — 8 scenario-driven
inline-snapshot tests that pin the observable shape of an end-to-end
runAgent call:

  1. happy_path_single_turn          — init + success result
  2. happy_path_multi_turn           — assistant text + tool use + result
  3. legacy_status_marker            — [STATUS] markers flow to spinner
  4. compaction_recovery             — compact_boundary + post-compact success
  5. auth_retry_storm_aborts_early   — AUTH_RETRY_LIMIT 401s → early abort
  6. mid_stream_error_no_success     — SDK throws before any success result
  7. error_result_rate_limit         — is_error result with 429 → RATE_LIMIT
  8. sdk_cleanup_after_success       — success result then SDK throws → success

Captures the structured result, spinner sequence, UI methods invoked,
SDK iterator construction count, and analytics event count per
scenario. Intent is not to add behavioural coverage (that lives in
agent-interface.test.ts — 315 tests) but to make future refactors
safe: any extraction that perturbs observable output will diff the
inline snapshot and fail loudly.

The harness pattern mirrors the runAgentWizardBody extraction series
that landed previously (snapshot harness + extractions). This commit
ships the harness alone so the extractions in the follow-up commit
can be verified against a stable baseline.
Extracts three self-contained closures from runAgent (src/lib/agent-interface.ts)
to module scope so they can be unit-tested in isolation:

  1. createStreamPillEmitter(spinner)
     — wraps the throttled stream-delta → status pill emitter previously
       defined inline (streamPillBuffer, streamPillTimer, flushStreamPill,
       enqueueStreamDelta, resetStreamPill). The closures touched no
       outer-function mutable state besides 'spinner', which is now a
       parameter.

  2. drainPriorResponse(prior)
     — best-effort cleanup of an SDK Query iterator before discarding it.
       Pure function with one parameter; references only logToFile from
       module scope.

  3. logSuppressedHookBridgeNoise(count, attempt)
     — emits a single-line summary if the stderr filter swallowed any
       hook-bridge-race lines during an attempt. Pure function with two
       parameters; references only logToFile.

runAgent body: 2580 → 2507 lines (-73).
agent-interface.ts: 5011 → 5066 lines (+55, due to JSDoc on extracted
helpers — the inline closures lacked equivalent docs).

All 323 tests in agent-interface.test.ts + run-agent-harness.test.ts
remain byte-identical green, including the inline-snapshot scenarios.
No SDK call shape changed. No outer-function variable removed.

The snapshot harness committed in the prior commit was the gate: each
extraction was followed by a harness run, and any snapshot diff would
have triggered a revert.
@kelsonpw kelsonpw force-pushed the refactor/run-agent-pass-1 branch from a6104a6 to 2e26678 Compare May 22, 2026 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant