refactor(agent): runAgent snapshot harness + 3 helper extractions#845
Open
kelsonpw wants to merge 2 commits into
Open
refactor(agent): runAgent snapshot harness + 3 helper extractions#845kelsonpw wants to merge 2 commits into
kelsonpw wants to merge 2 commits into
Conversation
Adds src/lib/__tests__/run-agent-harness.test.ts — 8 scenario-driven inline-snapshot tests that pin the observable shape of an end-to-end runAgent call: 1. happy_path_single_turn — init + success result 2. happy_path_multi_turn — assistant text + tool use + result 3. legacy_status_marker — [STATUS] markers flow to spinner 4. compaction_recovery — compact_boundary + post-compact success 5. auth_retry_storm_aborts_early — AUTH_RETRY_LIMIT 401s → early abort 6. mid_stream_error_no_success — SDK throws before any success result 7. error_result_rate_limit — is_error result with 429 → RATE_LIMIT 8. sdk_cleanup_after_success — success result then SDK throws → success Captures the structured result, spinner sequence, UI methods invoked, SDK iterator construction count, and analytics event count per scenario. Intent is not to add behavioural coverage (that lives in agent-interface.test.ts — 315 tests) but to make future refactors safe: any extraction that perturbs observable output will diff the inline snapshot and fail loudly. The harness pattern mirrors the runAgentWizardBody extraction series that landed previously (snapshot harness + extractions). This commit ships the harness alone so the extractions in the follow-up commit can be verified against a stable baseline.
Extracts three self-contained closures from runAgent (src/lib/agent-interface.ts)
to module scope so they can be unit-tested in isolation:
1. createStreamPillEmitter(spinner)
— wraps the throttled stream-delta → status pill emitter previously
defined inline (streamPillBuffer, streamPillTimer, flushStreamPill,
enqueueStreamDelta, resetStreamPill). The closures touched no
outer-function mutable state besides 'spinner', which is now a
parameter.
2. drainPriorResponse(prior)
— best-effort cleanup of an SDK Query iterator before discarding it.
Pure function with one parameter; references only logToFile from
module scope.
3. logSuppressedHookBridgeNoise(count, attempt)
— emits a single-line summary if the stderr filter swallowed any
hook-bridge-race lines during an attempt. Pure function with two
parameters; references only logToFile.
runAgent body: 2580 → 2507 lines (-73).
agent-interface.ts: 5011 → 5066 lines (+55, due to JSDoc on extracted
helpers — the inline closures lacked equivalent docs).
All 323 tests in agent-interface.test.ts + run-agent-harness.test.ts
remain byte-identical green, including the inline-snapshot scenarios.
No SDK call shape changed. No outer-function variable removed.
The snapshot harness committed in the prior commit was the gate: each
extraction was followed by a harness run, and any snapshot diff would
have triggered a revert.
a6104a6 to
2e26678
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
src/lib/__tests__/run-agent-harness.test.tswith 8 scenario-driven inline-snapshot tests pinning the observable shape of an end-to-endrunAgentcall (happy path, multi-turn, [STATUS] markers, compaction recovery, auth retry storm, mid-stream error, rate-limit, post-success SDK cleanup). The harness captures the structured result, spinner sequence, UI methods touched, SDK iterator construction count, and analytics event count — making any extraction's perturbation visible as a snapshot diff.runAgentare hoisted to module scope so they can be unit-tested in isolation:createStreamPillEmitter(spinner)— throttled stream-delta → status pill emitter (was 5 closures + 4 local vars inline)drainPriorResponse(prior)— best-effort SDK Query iterator cleanup (issue Stream closed race: outer-loop retry torn down while prior query's hook bridge is still in flight #297)logSuppressedHookBridgeNoise(count, attempt)— single-line stderr-suppression summaryagent-interface.test.ts+ the newrun-agent-harness.test.tsremain byte-identical green. No SDK call shape changed; no outer-function variable removed; no observable side effect reordered.Numbers
runAgentbody: 2580 → 2507 lines (-73)agent-interface.ts: 5011 → 5066 lines (+55, due to JSDoc on extracted helpers — the inline closures lacked equivalent docs).src/lib/__tests__/.Discipline
Followed the brief literally:
a175e701) and was verified green against pristinerunAgent.startTime,collectedText,lastResultMessage,attemptCount,upstreamGatewayFailures,authErrorSubkind,lastActivityAt,recentStatuses, …) and would require a state-holder pattern that's out of scope here.Patterns identified but deferred
completeWithSuccess— closes over 7+ outer vars (startTime, collectedText, lastResultMessage, lastParsedEventPlan, middleware, spinner, …). Would need a state-holder object.recordTerminal/terminalState— mutated from 5 sites, read from 7. Same problem.heartbeatIntervalcallback — references 4 outer vars (recentStatuses, startTime, attemptCount, lastActivityAt). Tractable via getter functions but observable side effects make it risky without expanding the harness.onActivity— small but observable; usesgetUI().resetStallStatus.These are good candidates for a follow-up PR once the harness has caught real regressions on this baseline and the surrounding code has settled.
Test plan
pnpm exec vitest run src/lib/__tests__/run-agent-harness.test.ts— 8/8 passpnpm exec vitest run src/lib/__tests__/agent-interface.test.ts— 315/315 passpnpm test— 4481/4481 across 299 files passpnpm exec tsc --noEmit— cleanpnpm lint— cleansrc/utils/wizard-abort.tsuntouchedsrc/lib/agent-runner.tsuntouchedsrc/lib/agent-interface.tsoutsiderunAgentuntouched🤖 Generated with Claude Code
Note
Medium Risk
Refactors core
runAgentcontrol-flow by hoisting streaming and retry/cleanup helpers to module scope; behavior should be unchanged but regressions could affect agent execution, UI status updates, or retry cleanup.Overview
Adds a new
run-agent-harness.test.tssnapshot-style integration harness that scripts SDK message streams and asserts the observable shape ofrunAgentruns (returned result normalization, spinner/status side effects, UI methods touched, SDKqueryretry count, and analytics event count) across 8 canonical scenarios.Refactors
runAgentby extracting three previously inline closures into exported, testable helpers:createStreamPillEmitter(throttled stream-delta→status pill with JSON-leak guard),drainPriorResponse(best-effort.return()cleanup of prior SDK iterators to avoid hook-bridge race noise), andlogSuppressedHookBridgeNoise(single-line summary of suppressed stderr noise).Reviewed by Cursor Bugbot for commit 2e26678. Bugbot is set up for automated code reviews on this repo. Configure here.