PR #67 follow-ups: run-lifecycle unification, heartbeat recovery, sys.exit deep fix, green CI by FlyM1ss · Pull Request #71 · Open-Finance-Lab/AgenticTrading

FlyM1ss · 2026-07-05T03:28:21Z

Closes out every open follow-up recorded on docs/reviews/PR67-FIX-CHECKLIST.md from the PR #67 review cycle. PR #67 merged while this pass was in flight, so these land as their own PR (upstream main, including the #69/#70 landing refreshes, is merged in — the shipped bundle was rebuilt from the merged source so the brand changes and the dead-component prune ship together).

What's here

Run-lifecycle unification (H4 follow-ups) — v2 runs now write protocol_runs rows through their whole lifecycle, which yields three fixes with one mechanism:

Unified per-agent active-run cap: both surfaces count run_store.count_active_runs() under the shared v1 create lock — an agent can no longer hold 2× MAX_ACTIVE_RUNS_PER_AGENT by splitting across v1+v2.
Reaper coverage for v2: each reaper pass drives a registered v2 sweep (register_reaper_sweep, wired in the composition root so domain/ never imports api/) that drains abandoned runs through elapsed deadlines and swaps terminal backends for DB-backed ArchivedBacktestBackend tombstones — market-data buffers freed while status/result/decisions/idempotent-replay stay serviceable.
Restart visibility: startup orphan-recovery covers v2 rows for free, and the v2 API rehydrates terminal rows (owners see failed instead of a 404; the anti-oracle 404 shape for foreign probes is preserved and pinned).

Multi-worker recovery hardening — protocol_runs gains owner_instance/heartbeat_at (migrated in place, ALTER race-tolerant); the reaper heartbeats every run whose session is live in this process; fail_stale_runs() (RUN_HEARTBEAT_STALE_SECONDS, default 300s) only fails runs nobody heartbeats — a sibling worker's live runs are safe, unlike the blanket startup UPDATE (which stays, for single-process).

sys.exit() deep fix (B0 class) — AlpacaDataLoader/BaselineGenerator/HourlyBacktester.load_data raise MarketDataUnavailableError (plain exception) instead of exiting; the backtest CLIs translate it to exit-code 1 at the process boundary. The (Exception, SystemExit) guards remain as defense-in-depth.

SQLite WAL on the shared DB file (both connection layers, best-effort with fallback).

agent_runs.metadata JSON column — records the effective LLM_MAX_OUTPUT_TOKENS per LLM run (init + migrate paths, parsed consistently by every reader).

Green CI end to end — the 5 pre-existing failures were stale expectations (public /runs listing is a deliberate 2026-05-25 product decision; /runs/latest/metrics filters agent_name == 'Agent'; the safe prompt was rewritten). Tests re-pinned to the real contracts; the active prompt regained a compact no-tools constraint line (tightens, never loosens — enforcement stays response-side in validate_llm_response). Suite: 837 passed / 0 failed; packaging 41.

Landing — dead pages/not-found.tsx + ui/card.tsx deleted; shipped bundle rebuilt from the merged source (tsc + vite green, headless smoke: renders, auth modal, 0 console errors).

Docs/CI — _owner_context session-credential caveat documented; CI DATABASE_PATH backstop; checklist updated with the full follow-up record.

Review

A 50-agent adversarial workflow (5 lens finders + 3-vote refute panels over 15 findings) reviewed the full diff; 12 confirmed findings were all resolved before this PR — including a critical cancel-clobber (a cleanup cancel racing the final decision downgraded a completed run's ledger row to closed, hiding its metrics). 30+ new red-green tests across test_run_lifecycle_unification.py, test_market_data_errors.py, test_agent_runs_metadata.py, test_sqlite_wal.py.

🤖 Generated with Claude Code

https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu

…ipped bundle pages/not-found.tsx was fully dead (App.tsx renders its own inline 404) and ui/card.tsx survived the LOW sweep only as its import — both deleted; tsc + vite build verified green. The committed frontend bundle was still the PRE-prune build: its CSS (107.76 kB) predates the 65-file component deletion (3618b6a). Swapped in the fresh assets (CSS 31.24 kB, JS byte-identical) and updated index.html's references, dropping the now-redundant ?v= cache-busters (filenames are content-hashed and vercel.json serves /assets/* immutable, / no-cache). Headless smoke: root renders, auth modal present, no console errors. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu

…TH backstop in CI B0/B1 follow-ups: _owner_context's X-Session-Id fallback makes session_id an ownership credential for import-created agents — document that 'session_id is never a credential' only holds for built-in agents. CI gets an explicit DATABASE_PATH env so a conftest isolation regression can never write into the committed seed DB. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu

H4 follow-up. RunStore (protocol steps) and BacktestDatabase (equity/trades) share one file; under the rollback journal a finalize's heavy writes block every reader. WAL lets readers proceed during commits — the actual concurrency shape here. Best-effort with fallback for filesystems without shm support; journal_mode persists in the file so one switch at construction covers all later connections from either layer (RunStore's own _enable_wal lands with the lifecycle-unification commit that reworks that module). Verified on WSL drvfs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu

… sweep, heartbeat recovery Closes the three H4 reconciliation follow-ups in one mechanism: v2 runs now write protocol_runs rows through their whole lifecycle (create=loading, load-failure/finalize/cancel update the row), so * the per-agent active-run cap counts run_store.count_active_runs() on BOTH surfaces under the SHARED v1 create lock — an agent can no longer hold 2× MAX_ACTIVE_RUNS_PER_AGENT by splitting across v1+v2. The v2 create path first reconciles its own finished backends (passive is_active() peek only; live runs are never touched under the lock); * the reaper reaches v2: reap_runs() drives sweeps registered via register_reaper_sweep() (composition root registers reap_v2_runs — domain never imports api/*). The sweep drains abandoned v2 runs through elapsed deadlines and swaps terminal backends for a DB-backed ArchivedBacktestBackend tombstone: market-data buffers freed, while status/result/decisions/idempotent-replay stay serviceable — and v2's _require_run rehydrates terminal rows post-restart (owners see 'failed' instead of a 404; anti-oracle 404 shape preserved for foreign probes); * startup orphan recovery now covers v2 rows for free, and multi-worker deployments get heartbeat-based recovery: rows carry owner_instance/heartbeat_at (schema migrated in place), every reaper pass heartbeats runs whose session is live in this process, and fail_stale_runs() only fails runs nobody heartbeats (RUN_HEARTBEAT_STALE_SECONDS, default 300s; blanket RUN_RECOVERY_ON_STARTUP stays for single-process). 16 new tests (tests/test_run_lifecycle_unification.py) pin the cross-surface cap both directions, sweep archival + deadline draining, restart rehydration, oracle shape, heartbeat stamping/refresh, stale recovery, and the legacy schema migration. FakeBackend gained an honest is_active(); test_active_run_count_is_side_effect_free re-pinned to the ledger-based counting (passivity invariant unchanged). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu

…t() (B0 deep fix) The class fix behind B0 and its recurring guard sites: AlpacaDataLoader (missing creds / missing alpaca-py), BaselineGenerator (same) and HourlyBacktester.load_data (no data) used to sys.exit(1). SystemExit is a BaseException — it evaded every 'except Exception' server boundary, silently killed daemon loader threads, and wedged the ASGI loop. They now raise MarketDataUnavailableError (a plain RuntimeError subclass); the CLI entrypoints (backtest_hourly_agent, backtest_custom_algo) translate it to exit code 1 at the process boundary. The existing (Exception, SystemExit) guards stay as defense-in-depth; their comments now describe them as such. app.py's paper-baseline init, the legacy /backtest/run background thread, and the algo-service subprocess path are covered by their existing 'except Exception' handlers with no local changes. Tests: tests/test_market_data_errors.py (red-green) pins raise-not-exit for all three libraries, the plain-Exception contract, and the CLI boundary. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu

…_TOKENS LOW-sweep residual: the per-request output-token ceiling is an env knob that changes a run's spend/behavior, but nothing recorded which value was in force. agent_runs gains a JSON metadata column (added in BOTH _init_schema and _migrate_schema, per the schema convention); insert_run accepts an optional metadata dict; every agent_runs reader returns it parsed (one _parse_run_row helper) so listings and get_run agree. The engine's LLM agent run records {llm_max_output_tokens: <effective value>}; rule-based runs record nothing. 6 red-green tests incl. legacy-schema migration. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu

The new CI ran RED on 5 failures that predate this PR (they fail on main identically). All five pinned contracts the product deliberately no longer has: * GET /runs is a PUBLIC listing (route docstring; the dashboard home page fetches it session-less for the seed runs) — the strict per-session listing tests now pin the real contract: public metadata listing, session-scoped per-run data access, middleware 400 on malformed session ids for scoped routes (fixture also now patches the backtests router's import-time db binding, which the old tests silently missed). * /runs/latest/metrics filters agent_name == 'Agent' (the internal hourly agent); the old fixture's 'Agent A'/'Agent B' rows could never match. * create_safe_prompt's strategy body was rewritten; the old CANNOT-phrased constraint sentences only exist in a retired commented-out prompt. The active prompt regains a compact no-tools constraint line (parity with CUSTOM_STRATEGY_OUTPUT_CONTRACT — tightens, does not loosen) and the test pins the current wording; the hard boundary remains response-side validate_llm_response enforcement (unchanged, still pinned). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu

…ise contract test_missing_credentials_exits pinned the retired sys.exit(1) behavior; it now pins MarketDataUnavailableError (companion to the B0 deep-fix commit). Full backend suite: 829 passed, 0 failed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu

…fication A 50-agent review (5 lenses, 3-vote refute panels) of the follow-ups diff confirmed 12 findings; the real defects, all fixed red-green: * CRITICAL cancel-clobber: POST /v2/runs/{id}/cancel hardcoded status "closed" into the run's ledger row. A cleanup cancel landing after the final decision (before the sweep archives the backend) downgraded a genuinely completed run to "closed" — in the DB row AND the in-memory session — permanently hiding its metrics. cancel_run now no-ops on terminal runs and reports the run's true state; BacktestBackend.cancel() guards the terminal check under the session lock (closes the peek→cancel race with a finalizing submit). * _archive_run no longer swaps in the tombstone when the terminal row write fails — the backend stays live so the next sweep retries, instead of the row freezing in an active status with nothing left to reconcile it. * v2 rows now leave "loading" (→ "running") when the background market-data load succeeds. * RunStore's heartbeat-column migration tolerates losing the probe→ALTER race to a concurrently-starting worker (duplicate-column errors are the goal state, not a crash). * ArchivedBacktestBackend.from_record recovers step_index/total_steps for completed runs from the decision log (restart-rehydrated runs reported a misleading 0/0). * test_active_run_count_is_side_effect_free regained its count assertion (ledger-based) alongside the passivity invariants. Accepted + documented in code: cross-surface create serialization on the shared lock, and deliberate v2<->v1 cross-surface run visibility. Suites after fixes: backend 837 passed / 0 failed; packaging 41 passed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu

Upstream merged PR #67 plus two landing refreshes (#69/#70) while this follow-up pass was in flight. Conflicts were confined to the shipped frontend bundle: upstream's refresh was built before this branch's dead component prune, and this branch's refresh predated upstream's brand changes (Hero/Navbar/index.css, atltransparent logo, Inter 800). Resolved by REBUILDING the bundle from the merged landing source — brand changes + prune together (tsc green, vite build green, index-CoQ1VbAs.css 31.71 kB, headless smoke: renders, auth modal, 0 console errors) — and pointing index.html at the rebuilt hashed assets (content hashes + the vercel.json immutable/no-cache split make the ?v= params redundant). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SVGFyxaNN4VPyAxn2hgsiu

… reads, bounded registry Seven findings from the PR #71 review, each red-green pinned in tests/test_pr71_review_fixes.py: 1. cancel-during-loading race — load_market_data published waiting_decision WITHOUT _step_lock, defeating cancel()'s lock guard; the background _load also wrote failed/running unconditionally. A cancel landing during the load was silently reverted (run kept running, cap slot freed, client misled). Both load paths now respect an already-terminal (cancelled) status under the lock. 2. archived status() dropped baseline_run_ids/compare_url — the live get_status() returns them for a completed run but the tombstone omitted them, so the same endpoint changed shape once the reaper archived the run. ArchivedBacktestBackend.status() now rebuilds both from the persisted baseline columns (shared build_compare_url helper). 3. rehydrated failed/closed runs reported 0/0 steps — from_record only recovered counts for completed runs (from the decision log). protocol_runs now persists step_index/total_steps at archive time; from_record reads them (decision-log fallback kept for legacy rows). 4. cancel hung during finalize — cancel() blocked on _step_lock held through _finalize's baseline generation (seconds-to-minutes). _finalize now publishes "completed" before the best-effort baseline block, and cancel() short-circuits on a lock-free terminal read. 5. v2 _runs never pruned — the reaper now evicts prior-pass archived tombstones (a later read rehydrates from the terminal row), bounding _runs by ACTIVE runs instead of total historical runs. 6/7. dedupe — extracted a shared enable_wal(db_path) helper (was duplicated verbatim across both DB layers) and a single TERMINAL_STATUSES constant in execution/base.py (was defined three times). Suite: 843 passed / 0 failed; packaging 41. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01KoZVnrSPdYNyrxSfrUwaok

…eanup Quality-only cleanups (no behavior change; suite 843/0). From a 4-lens review of the PR diff: - Extract shared build_final_metrics(run) in external_run_service (mirrors build_compare_url); the 9-key completed-run metrics dict was duplicated verbatim in ArchivedBacktestBackend.status(). - BacktestBackend.advance() calls session.drain_expired() instead of get_current_step(), which rebuilt + discarded a full market snapshot under _step_lock on every reaper pass, per live run. - _archive_run derived step_index/total_steps twice with divergent is-None vs or fallbacks (disagreeing on a real 0) — collapse to one helper. - Drop the dead `not isinstance(backend, ArchivedBacktestBackend)` guard in cancel_run (active is read from is_active(), which a tombstone always reports False, so active already implies a live backend). - reap_runs and the _v2_fakes test double use the TERMINAL_STATUSES constant instead of hardcoded status tuples (the constant's own anti-drift purpose). - get_runs_by_mode routes through _parse_run_row like the other five agent_runs readers (no consumer depends on the raw-string metadata shape). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01KoZVnrSPdYNyrxSfrUwaok

FlyM1ss and others added 12 commits July 5, 2026 10:04

FlyM1ss merged commit 16a780e into main Jul 5, 2026
2 checks passed

FlyM1ss deleted the pr67-followups branch July 5, 2026 10:03

This was referenced Jul 5, 2026

Sync: absorb allan-fork prod commits (baseline look-ahead fix, agent editor) + conflict resolution #73

Merged

Deploy: Claude leaderboard entries + PR #71 fixes + your look-ahead fix, reconciled Allan-Feng/AgenticTrading#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PR #67 follow-ups: run-lifecycle unification, heartbeat recovery, sys.exit deep fix, green CI#71

PR #67 follow-ups: run-lifecycle unification, heartbeat recovery, sys.exit deep fix, green CI#71
FlyM1ss merged 12 commits into
mainfrom
pr67-followups

FlyM1ss commented Jul 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FlyM1ss commented Jul 5, 2026

What's here

Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant