Skip to content

delete unimplemented broadcast transport path [need discussion]#24

Open
TianyeGGBond wants to merge 1 commit into
rlops:zhenyu/miles-mvp-e2efrom
TianyeGGBond:tianye/m11-single-transport-path
Open

delete unimplemented broadcast transport path [need discussion]#24
TianyeGGBond wants to merge 1 commit into
rlops:zhenyu/miles-mvp-e2efrom
TianyeGGBond:tianye/m11-single-transport-path

Conversation

@TianyeGGBond

Copy link
Copy Markdown
Collaborator

Context

MilesModelUpdateService.sync_selected_workers was named/documented as a
two-path weight transport (cpu_serialize + NCCL broadcast) and exposed a
public broadcast_local_ranks kwarg. But any non-empty value immediately
raised NotImplementedError: the cache-owner sender-side NCCL path
(init_process_group + per-bucket dist.broadcast) was never wired — only
the receiver-side fan-out existed.

So the only path that ever runs is cpu_serialize. The broadcast kwarg,
subset validation, NCCL rank assignment, and world_size math are dead
code that makes the surface look like it offers something it doesn't.

Change

pipeline/miles_model_update_service.py:

  • Remove the broadcast_local_ranks kwarg from sync_selected_workers,
    along with the subset validation and the NotImplementedError branch.
    Every target now goes over cpu_serialize.
  • Drop the broadcast_set parameter threaded through _run_atomic_unit
    and _build_plan, and the broadcast field from the done-log line.
  • In _build_plan, replace the broadcast NCCL rank loop / world_size
    calculation with single-path constants.
  • Tighten the docstring/comments to describe the single cpu_serialize path.

Wire compatibility

The wire plan shape is intentionally unchanged. broadcast_local_ranks is
still present (empty), comm_ranks are all 0, and world_size is 1 — the
exact values the cpu_serialize-only path already produced. The MILES
run_sync_session plan-key validator sees no difference.

Net: +20 / -60 lines, no behavior change on the live path.

sync_selected_workers advertised a two-path transport via a public
broadcast_local_ranks kwarg, but any non-empty value immediately raised
NotImplementedError because the cache-owner sender-side NCCL path was never
wired. In practice only the cpu_serialize path runs.

Remove the broadcast kwarg, validation, and NotImplementedError so the
service exposes only the path it actually implements. The wire plan is
unchanged: broadcast_local_ranks is still emitted (empty), comm_ranks are
all 0, and world_size is 1 — exactly the values the cpu_serialize-only path
already produced, so the MILES run_sync_session contract is untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@TianyeGGBond TianyeGGBond changed the title refactor(miles): drop unimplemented broadcast transport path delete unimplemented broadcast transport path [need decision] Jun 21, 2026
@TianyeGGBond TianyeGGBond changed the title delete unimplemented broadcast transport path [need decision] delete unimplemented broadcast transport path [need discussion] Jun 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant