Simplify CPU bucket cache locking by TianyeGGBond · Pull Request #14 · rlops/miles

TianyeGGBond · 2026-06-04T19:01:24Z

Context

The F4 CPU bucket cache path currently has two layers of locking around the same state:

CPUBucketCache owns an internal threading.Lock for _buckets and _cache_ready_step.
MegatronTrainRayActor also creates _cache_lock and wraps both build_cpu_bucket_cache and run_sync_session with it.

In the current deployment model, train actors are created as default synchronous Ray actors without max_concurrency, and these methods are not async. That means build_cpu_bucket_cache and run_sync_session are already serialized by Ray on the same actor. The actor-level lock is therefore redundant, and its comments make the implementation look more concurrent than it is.

What changed

Removed MegatronTrainRayActor._cache_lock initialization.
Removed the outer actor-level lock around build_cpu_bucket_cache.
Removed the outer actor-level lock around run_sync_session.
Kept CPUBucketCache's internal lock so the cache object remains self-contained and protects its own state.
Shortened F4 cache docstrings and comments to describe the API behavior without carrying review-history or over-defensive concurrency rationale in code.

Why

This keeps the locking model to one layer while preserving behavior. The cache still publishes and reads _cache_ready_step through its own methods, but the train actor no longer adds a redundant critical section around synchronous Ray actor methods. This is closer to the style in miles main: rely on the actor execution model where it applies, and keep local state protection inside the small helper object.

Validation

python -m py_compile miles/backends/megatron_utils/actor.py miles/backends/megatron_utils/update_weight/cpu_bucket_cache.py
git diff --check
git grep -n _cache_lock -- miles/backends/megatron_utils/actor.py miles/backends/megatron_utils/update_weight/cpu_bucket_cache.py returns no matches

TianyeGGBond added 2 commits June 4, 2026 15:00

Simplify CPU bucket cache locking

f318518

Clarify CPU bucket cache contracts

f47642b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify CPU bucket cache locking#14

Simplify CPU bucket cache locking#14
TianyeGGBond wants to merge 2 commits into
rlops:zhenyu/m11-mvp-testfrom
TianyeGGBond:tianye/simplify-cache-locking

TianyeGGBond commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

TianyeGGBond commented Jun 4, 2026

Context

What changed

Why

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant