Support for glm_moe_dsa architecture (GLM-5.2 DeepSeek Sparse Attention)

## Request

GLM-5.2 (z-ai/glm-5.2) uses the `glm_moe_dsa` architecture (`GlmMoeDsaForCausalLM`), which combines Mixture-of-Experts with DeepSeek Sparse Attention (DSA). This is distinct from the existing `glm4_moe` and `glm4_moe_lite` architectures currently supported in SwiftLM.

## Why this matters

GLM-5.2 is a frontier MoE model (~308GB in 3.5bpw MLX format, ~384GB in mxfp4). On a 128GB M5 Max, the model exceeds RAM — making SwiftLM's `--stream-experts` SSD expert streaming the ideal solution. Only active experts (~40B params per token) need to be in memory, with the rest streamed from NVMe.

## Current state

- `mlx-lm` 0.31.3 now supports `glm_moe_dsa` (merged in commit d711c5f)
- `transformers` v5.11.0 also supports `GlmMoeDsa`
- The MLX 3.5bpw quantized model is available at `avlp12/GLM-5.2-Alis-MLX-Dynamic-3.5bpw`
- SwiftLM's `LLMModelFactory.swift` supports `glm4_moe` and `glm4_moe_lite` but not `glm_moe_dsa`

## What's needed

Add `glm_moe_dsa` as a supported architecture in `LLMModelFactory.swift`, mapping to the appropriate Swift MLX model class. The DSA attention pattern differs from standard MHA/GQA — it uses sparse attention with a sliding window + global tokens pattern.

## Context

We're building an autonomous agent fleet (Based Agent Systems) that runs GLM-5.2 as the canonical reasoner via OpenRouter. Local MLX inference would eliminate API costs and reduce latency. SwiftLM's SSD expert streaming is the only viable path for running a 308GB MoE model on 128GB RAM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for glm_moe_dsa architecture (GLM-5.2 DeepSeek Sparse Attention) #111

Request

Why this matters

Current state

What's needed

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Support for glm_moe_dsa architecture (GLM-5.2 DeepSeek Sparse Attention) #111

Description

Request

Why this matters

Current state

What's needed

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions