Skip to content

fix(miles): make shrink_engines post-sleep VRAM assert unconditional#21

Open
zhenyulincs wants to merge 2 commits into
zhenyu/m11-mvp-testfrom
zhenyu/m11-post-sleep-vram-floor
Open

fix(miles): make shrink_engines post-sleep VRAM assert unconditional#21
zhenyulincs wants to merge 2 commits into
zhenyu/m11-mvp-testfrom
zhenyu/m11-post-sleep-vram-floor

Conversation

@zhenyulincs

@zhenyulincs zhenyulincs commented Jun 28, 2026

Copy link
Copy Markdown

Problem

RolloutManager.shrink_engines exposed an opt-in parameter for its Step-5 post-sleep VRAM leak check:

def shrink_engines(self, engine_indices, *, post_sleep_vram_threshold_gb: float | None = None):
    ...
    if post_sleep_vram_threshold_gb is not None:   # default None → skipped
        ... assert_post_sleep_vram_below_threshold ...

This made the handoff correctness check effectively dead: there is no static call site for shrink_engines in the repo (it is invoked dynamically by the external RLix coordinator via Ray RPC), and the parameter is keyword-only with default None, so no caller ever threaded a value into it — the Step-5 assert never executed.

Correction (vs. an earlier version of this description): the --miles-post-sleep-vram-threshold-gb arg does exist in arguments.py (default 1.0). An earlier draft claimed it "was never defined" — that was wrong (a grep with underscores missed the dashed argparse flag). The accurate statement is the narrower one above: the value was simply never passed into shrink_engines.

Why it should not be a parameter

A colocated GPU is time-shared — only one of train/infer holds VRAM at any moment. shrink_engines calls release_memory_occupation(tags=None) (full release of weights/KV/cuda_graph), so post-release residency must drop to near-zero or the next phase OOMs. That makes this assert a correctness floor for the GPU handoff, not a tunable knob. It should always run, with a fixed threshold.

Change

  • Remove the post_sleep_vram_threshold_gb parameter from shrink_engines.
  • Add module constant POST_SLEEP_VRAM_THRESHOLD_GB = 1.0 (GiB) with a comment explaining why it's a fixed floor.
  • Run the Step-5 assert unconditionally (Anti-regression invariant refactor(miles): drop unused router engine metadata #8).
  • Fix the stale sglang_engine.py docstring that referenced the (now-removed) arg.
  • Remove the orphaned --miles-post-sleep-vram-threshold-gb argparse flag (arguments.py): nothing reads args.miles_post_sleep_vram_threshold_gb after the value is hardcoded, and its default of 1.0 already matched the constant.

Safety

  • Removing the parameter breaks no caller: zero static call sites for shrink_engines in the repo, and the external coordinator could not have been passing it (the value was never wired in).
  • Removing the arg breaks nothing: verified zero remaining references to miles_post_sleep_vram_threshold_gb (attribute) or --miles-post-sleep-vram-threshold-gb (flag) across the repo after the change.

Test plan

  • ast.parse clean on all changed files
  • No residual post_sleep_vram_threshold_gb parameter references
  • No residual references to the removed arg (flag or attribute)
  • RLix-mode shrink/expand smoke confirms the assert fires on a real release (and does not false-positive at ~0 GiB residency)

shrink_engines exposed `post_sleep_vram_threshold_gb: float | None = None`,
gating the Step-5 post-sleep VRAM assert behind an opt-in parameter. No
caller (in-repo or the external RLix coordinator RPC) ever passed it, and the
arg it documented (`args.miles_post_sleep_vram_threshold_gb`) was never
defined — so the leak check was dead code.

A colocated GPU is time-shared (only one of train/infer holds VRAM at a
time), so a full release_memory_occupation must always drop residency to
near-zero. That makes the assert a handoff correctness floor, not a tuning
knob. Remove the parameter, hardcode POST_SLEEP_VRAM_THRESHOLD_GB = 1.0 as a
module constant, and run the assert unconditionally (Anti-regression
invariant #8). Fix the stale sglang_engine docstring that referenced the
non-existent arg.
The prior commit hardcoded POST_SLEEP_VRAM_THRESHOLD_GB = 1.0 and made the
shrink_engines post-sleep VRAM assert unconditional, so nothing reads
args.miles_post_sleep_vram_threshold_gb anymore. Remove the now-dead
argparse flag (its default of 1.0 already matched the hardcoded constant).

Correction to the previous commit message: that flag WAS in fact defined
(arguments.py: --miles-post-sleep-vram-threshold-gb, default 1.0). The
earlier "the arg was never defined" claim was wrong — the check used a
grep with underscores that missed the dashed argparse flag. What is
actually true is narrower: no caller ever threaded the value into
shrink_engines, so the assert never executed regardless.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant