Skip to content

fix: AutoTP partition_config uses full hierarchical module path#8088

Open
delock wants to merge 1 commit into
deepspeedai:masterfrom
delock:gma/fix_tp_partition_config
Open

fix: AutoTP partition_config uses full hierarchical module path#8088
delock wants to merge 1 commit into
deepspeedai:masterfrom
delock:gma/fix_tp_partition_config

Conversation

@delock

@delock delock commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Problem

When using custom patterns with AutoTP, built from the immediate parent only instead of the accumulated hierarchical path.

This meant patterns like never matched because the name passed was just (missing prefix).

Impact: Custom patterns are silently ignored — parameters are not TP-sharded, causing OOM on multi-GPU setups with large models.

Fix

Two changes in :

  1. Line 574: Build full_name from class_name (accumulated hierarchical path) instead of prev_name (immediate parent only). This ensures patterns see the complete module path.

  2. Line 591: Pass name instead of full_name to the recursive _replace_module call, preventing path duplication at deeper nesting levels. Without this, class_name would accumulate the full prefix twice (e.g., model.layers.0.model.layers.0.self_attn).

Note

This bug only affects the partition_config code path (custom patterns). The default linear_policies and HuggingFace tp_plan paths are unaffected.

Signed-off-by: Guokai Ma <guokai.ma@intel.com>
@delock delock force-pushed the gma/fix_tp_partition_config branch from 6daac90 to 9ae92f9 Compare June 24, 2026 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant