Skip to content

[Question] Any examples for 200B+ MoE training with FSDP? #1437

@cailun01

Description

@cailun01

Hi XTuner Team,

I noticed in the official documentation that one of the key highlights of Xtuner V1 is its ability to train 200B+ models using FSDP instead of EP.

Breakthrough Performance Bottleneck: First time achieving FSDP training throughput surpassing traditional 3D parallel solutions on MoE models above 200B scale (https://xtuner.readthedocs.io/en/latest/#core-features)

I am currently looking to train the 200B+ MoE Model such as Qwen3-235B-A22B model and would love to leverage this capability. Could you please provide a reference example or a configuration template for a model of this scale?

Thank you for the great work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions