PLT-458: Open-loop scheduler (replace closed-loop dequeue) by bdchatham · Pull Request #48 · sei-protocol/sei-load

bdchatham · 2026-06-12T15:18:41Z

Implements PLT-458 — the core of the coordinated-omission fix. Issues at t_i = t₀ + i/λ independent of in-flight completion, so latency no longer hides the backlog.

What

New sender/scheduler.go (openLoopScheduler): arrival clock at t₀ + i/λ (absolute-instant sleep → drift-free), stamps IntendedSendTime at the true scheduled instant + a per-tx SequenceIndex. Workers become pure async senders (the limiter.Allow() busy-spin is gone).
Bounded in-flight + drop-and-count: a semaphore caps true unacked sends; on overflow the overdue tx is dropped and counted — the arrival clock is never throttled by backpressure. The permit is released at real send completion (a LoadTx.OnComplete hook fired by the worker after sendTransaction), so dropped measures genuine load-shed, not buffer geometry.
One rate authority: λ comes from the ramper's shared rate.Limiter. Behind a config flag (--arrival-model), with the legacy closed-loop path retained as the regression baseline. arrival_model recorded.

Review (systems + measurement + idiom, two rounds)

The loop caught and fixed two blocking issues before merge:

B1: open-loop with the default TPS=0 → rate.Inf → gap=0 → a degenerate constant latency anchor. Now rejected at config validation (open-loop requires finite positive λ: TPS>0 or a ramp).
B2: the in-flight semaphore originally released at enqueue, so dropped measured buffer geometry and a synchronous test masked it. Fixed to release at real send completion; test now uses an async sender + a direct PermitHeldUntilCompletion guard. Verified: concurrency-correct (sync.Once, happens-before intact, -race -count=20 clean), conservation issued == sent + dropped holds.

schedule_lag (PLT-463) remains the primary CO-detection gate. Forward-note: inclusion-rate denominators must use sent, never issued.

Decision brief: designs/sei-load-workload-modeler/PLT-458-open-loop-scheduler.md.

🤖 Generated with Claude Code

Make transaction arrival open-loop to fix coordinated omission: tx i is issued at t₀ + i/λ independent of in-flight completion, so a slow SUT no longer slows the generator and hides backlog in latency. - sender/scheduler.go: openLoopScheduler owns t₀ and the monotonic sequence index i, derives λ from the shared rate.Limiter as a clock source (sampled per tick to honor a ramping λ; telescopes to t₀ + i/λ at fixed λ), and stamps IntendedSendTime at the true scheduled instant. - Overflow is bounded-in-flight + drop-and-count: a non-blocking semaphore TryAcquire admits the tx or drops-and-counts it; the arrival clock is never blocked on capacity (REL8/REL9 load shedding). - One rate authority preserved: the ramper still drives λ via limiter.SetLimit; the worker's busy-spin Allow() gate is replaced by a blocking Wait (closed-loop only) and disabled under open-loop. - Behind config flag arrival-model (default closed_loop, the regression baseline) + max-in-flight; arrival_model and run_txs_dropped_total are recorded at run end. - types.LoadTx gains SequenceIndex (scheduler-owned, single-write per the documented concurrency contract) for PLT-463 schedule-lag attribution; dropped txs carry zero InclusionTime and are kept out of inclusion-rate denominators. Tests: schedule-accuracy (tracks t₀ + i/λ within tolerance), clock not throttled by a slow sender (overrun dropped not blocked), ramped-λ gap shrink, and stamp-before-handoff under -race. go build + go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… bound B1: reject open_loop without a finite positive arrival rate. With TPS=0 and no ramp, λ=rate.Inf, the inter-arrival gap collapses to 0, IntendedSendTime never advances past t₀, and the scheduler spins and drops everything. Add config.Settings.Validate (TPS>0 or --ramp-up required for open_loop) and call it after ResolveSettings; fail fast with a clear error. minScheduleRate stays as divide-by-zero/+Inf defense-in-depth only. B2: tie the in-flight permit to real send completion, not enqueue. Worker.Send (via ShardedSender) returns at enqueue, so the prior defer release() bounded enqueue backlog, not unacked sends — and dropped reflected buffer geometry. Thread a LoadTx.OnComplete hook the worker invokes after sendTransaction; the scheduler stamps it to release the permit so maxInFlight bounds true in-flight and dropped measures genuine load-shed. Enqueue-failure path completes inline. schedule_lag (PLT-463) remains the primary CO-detection gate. Also: reject unrecognized --arrival-model at config load; drop the SequenceIndex self-disambiguation claim (gate on run-level arrival_model); state the dropped-tx inclusion-denominator invariant plainly; fix relase typo. Tests: replace the synchronous fakeSender with an async enqueue-and-complete sender so the slow-sender drop test exercises production semantics; add the permit-held-until-completion guard, a conservation invariant, and config validation rejection tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor · 2026-06-12T15:18:46Z

PR Summary

Medium Risk
Changes core load-generation timing and default remains closed-loop, but open-loop alters latency measurement semantics and in-flight accounting; mistakes in OnComplete or validation could skew benchmarks or silently drop load.

Overview
Introduces an open-loop transaction arrival path to fix coordinated omission: txs are scheduled at t₀ + i/λ from the shared rate limiter, independent of sender backlog, with closed_loop unchanged as the default baseline.

Configuration and wiring. New --arrival-model and --max-in-flight settings (plus Settings.Validate() rejecting open-loop without finite λ via TPS>0 or --ramp-up). main enables open-loop on the live-send dispatcher path, disables worker-side limiter gating when open-loop is active, and passes arrival model + drop count into run-summary metrics.

Scheduler and pipeline. openLoopScheduler sleeps to absolute schedule instants, stamps IntendedSendTime and SequenceIndex, admits sends via non-blocking Semaphore.TryAcquire, and drops and counts when in-flight is saturated. Permits release on LoadTx.OnComplete after the worker’s real RPC send (not at enqueue). Dispatcher branches between closed-loop generate-send lockstep and open-loop; workers gain optional RateLimited and the OnComplete hook.

Observability and types. run_txs_dropped_total (tagged by arrival_model) and extended LoadTx / package docs for schedule-lag semantics. Broad unit tests plus real-worker integration tests guard schedule accuracy, drop behavior, permit lifecycle, and conservation.

^{Reviewed by Cursor Bugbot for commit 33d7237. Bugbot is set up for automated code reviews on this repo. Configure here.}

Omit the redundant time.Duration type from the minGap declaration in scheduler_test (inferred from time.Hour). Caught by golangci-lint (CI gate). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…-loop-scheduler # Conflicts: # main.go # sender/sharded_sender.go # sender/worker.go

cursor · 2026-06-12T18:35:42Z

+	}
+	d.mu.Lock()
+	d.totalSent++
+	d.mu.Unlock()


Failed sends break issued accounting

Medium Severity

Open-loop onSent increments totalSent only when the worker reports a nil send error. Transactions that pass admission and enqueue but fail in sendTransaction release the in-flight permit and are logged, yet they are neither counted as sent nor as dropped, so issued == sent + dropped and related summaries can be wrong under RPC failures.

Additional Locations (1)

sender/scheduler.go#L135-L141

^{Reviewed by Cursor Bugbot for commit 3cdaf95. Configure here.}

Every existing scheduler test drove a fake TxSender that fired tx.OnComplete itself, so the suite stayed green even if the real Worker forgot to invoke it in runTxSender — leaking the open-loop in-flight semaphore (permits never released → the maxInFlight bound becomes meaningless). Add an httptest JSON-RPC harness (answers eth_sendRawTransaction, the only RPC the ethclient send path issues) behind the real Worker + open-loop scheduler: - Conservation on the real path: issued == completed + dropped, with completed driven by the real worker's OnComplete; handled sends matched against the real RPC server's count. - Permit released by the worker: maxInFlight=1 with a server that blocks one send holds exactly one in flight and drops the rest; releasing it resumes flow — which is only possible if the worker fires OnComplete. Both tests fail when the OnComplete invoke is removed (verified). Also clarify the scheduler doc: enqueue is async but the RPC send is synchronous, so the permit is held for the full round-trip. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The conservation test sampled `succeeded == handled` against an in-flight window: the httptest server bumps `handled` on receiving eth_sendRawTransaction, but the worker bumps `succeeded` only after SendTransaction returns and OnComplete fires. CI's slower scheduling caught that window (handled=200, succeeded=199). The dominant cause was teardown ordering, not pure sampling: the scheduler ran as the scope's main task, so the instant it exhausted the generator and returned, service.Run canceled the worker's context — aborting the last send whose 200 OK the server had already counted. That send completed with context-canceled, so completed++ but not succeeded++. Fix: run the scheduler and worker as background tasks behind a main gate that blocks until the test tears down, so the scope stays alive until quiescence. Anchor the assertion on the fixed total and require exhaustion, conservation, and equality together in one predicate, evaluated only once they all hold (a stable fixpoint, since the counters are monotonic and no new work is issued after exhaustion). Correctness depends on convergence, not the deadline. Verified: go test -race -count=50 and -count=20 -cpu=1,2,4 green; GOMAXPROCS=1 and =2 green. Falsification holds — commenting out the OnComplete invoke in runTxSender fails both tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 0167803. Configure here.}

cursor · 2026-06-12T19:17:08Z

 			Debug:         cfg.Settings.Debug,
 			Collector:     collector,
 			Limiter:       limiter,
+			RateLimited:   rateLimited,


Prewarm skips open-loop rate limit

Medium Severity

With arrival_model set to open loop, workers are built with RateLimited false for the whole run. Prewarm still uses those workers before the open-loop scheduler starts, so prewarm txs no longer honor the shared rate.Limiter and can flood endpoints at channel speed instead of configured TPS.

Additional Locations (1)

main.go#L324-L329

^{Reviewed by Cursor Bugbot for commit 0167803. Configure here.}

Move the dense coordinated-omission / arrival-model narrative out of scheduler.go into a new sender/doc.go package doc, and lean the inline comments to terse pointers. No behavior change. The load-bearing inline notes stay (leaned) at the code they guard: the worker's OnComplete permit-release and the single-writer stamp-before- hand-off contract. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

bdchatham and others added 2 commits June 12, 2026 07:58

cursor Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread sender/scheduler.go

PLT-458: fix staticcheck lint (ST1023)

9efe50c

Omit the redundant time.Duration type from the minGap declaration in scheduler_test (inferred from time.Hour). Caught by golangci-lint (CI gate). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

bdchatham mentioned this pull request Jun 12, 2026

ci: local↔CI test parity — make verify + pinned golangci-lint + .golangci.yml #49

Open

Merge remote-tracking branch 'origin/main' into brandon2/plt-458-open…

3cdaf95

…-loop-scheduler # Conflicts: # main.go # sender/sharded_sender.go # sender/worker.go

cursor Bot reviewed Jun 12, 2026

View reviewed changes

bdchatham and others added 2 commits June 12, 2026 11:47

cursor Bot reviewed Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PLT-458: Open-loop scheduler (replace closed-loop dequeue)#48

PLT-458: Open-loop scheduler (replace closed-loop dequeue)#48
bdchatham wants to merge 7 commits into
mainfrom
brandon2/plt-458-open-loop-scheduler

bdchatham commented Jun 12, 2026

Uh oh!

cursor Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

cursor Bot Jun 12, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bdchatham commented Jun 12, 2026

What

Review (systems + measurement + idiom, two rounds)

Uh oh!

cursor Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Uh oh!

cursor Bot Jun 12, 2026

Choose a reason for hiding this comment

Failed sends break issued accounting

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 12, 2026

Choose a reason for hiding this comment

Prewarm skips open-loop rate limit

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cursor Bot commented Jun 12, 2026 •

edited

Loading