DAPD : Dependency-Aware Parallel Decoding via Attention for Diffusion LLMs

Introduction

Parallel decoding for Diffusion LLMs (dLLMs) is difficult because each denoising step provides only token-wise marginal distributions, while unmasking multiple tokens simultaneously requires accounting for inter-token dependencies. We propose Dependency-Aware Parallel Decoding (DAPD), a simple, training-free decoding method that uses self-attention to induce a conditional dependency graph over masked tokens. At each iteration, edges in this graph capture strong token interactions, while non-edges indicate weak dependence. Parallel decoding is then reduced to selecting an independent set on the graph and unmasking the selected tokens in parallel. This avoids co-updating strongly coupled tokens without auxiliary models or retraining. Experiments on LLaDA and Dream show that DAPD improves the accuracy–steps trade-off over existing methods and enables more globally distributed parallel updates that better exploit the any-order generation capability of dLLMs.

Performance

Each cell reports Acc (%) / Steps (NFE). DAPD rows are results with torch 2.5.1+cu121.

LLaDA

DAPD 1-block reduces the average decoding steps from 256 to 33.8 steps for Direct (🔥 7.6x fewer steps) and 48.1 steps for Staged (🔥 5.3x fewer steps).

Method	HumanEval	MBPP	GSM8K	Math500	IFEval
DAPD-Direct 1-block	34.2 / 23.5	36.0 / 22.5	71.4 / 34.0	27.6 / 46.5	57.3 / 42.6
DAPD-Direct 4-block	42.7 / 59.5	38.8 / 36.7	75.8 / 61.4	26.2 / 84.6	58.6 / 94.6
DAPD-Staged 1-block	36.6 / 45.2	40.4 / 37.6	71.1 / 48.9	28.4 / 58.9	62.0 / 50.1
DAPD-Staged 4-block	37.8 / 92.8	38.8 / 96.8	74.6 / 111.0	27.8 / 123.3	58.0 / 99.7
Fast-dLLM 1-block	10.4 / 40.7	9.6 / 34.2	7.5 / 89.4	1.8 / 76.4	41.7 / 31.7
EB-Sampler 1-block	13.4 / 85.9	8.0 / 61.1	6.6 / 143.4	2.0 / 136.3	30.7 / 108.3
KLASS 1-block	11.0 / 97.3	19.8 / 43.3	26.3 / 72.6	3.0 / 83.4	40.1 / 96.5
Fast-dLLM 4-block	37.2 / 92.1	20.6 / 41.5	76.8 / 72.8	28.0 / 95.6	58.3 / 100.0
EB-Sampler 4-block	37.2 / 110.4	19.4 / 52.6	76.1 / 86.9	28.2 / 113.2	57.0 / 136.3
KLASS 4-block	37.8 / 149.4	26.0 / 53.9	75.6 / 93.9	26.4 / 118.6	58.4 / 139.0

Dream

Method	HumanEval Instruct	MBPP	GSM8K	Math500	IFEval
DAPD-Direct	50.6 / 116.0	49.4 / 26.8	58.8 / 60.0	30.6 / 63.6	37.2 / 17.4
DAPD-Staged	42.7 / 110.6	49.4 / 48.7	52.6 / 66.8	26.6 / 60.6	35.4 / 83.0
Fast-dLLM	43.3 / 112.2	30.6 / 67.7	47.7 / 90.2	17.5 / 158.7	18.2 / 53.2
EB-Sampler	45.7 / 155.4	30.6 / 186.5	44.5 / 127.6	13.0 / 190.5	7.1 / 115.2
KLASS	59.8 / 133.3	34.4 / 60.8	45.1 / 154.4	13.0 / 204.2	7.1 / 132.1

What Is Included

dapd/: core DAPD implementation and a minimal generation test.
baselines/: vendored KLASS, Fast-dLLM, and EB code required by wrappers.
evaluation/lm-evaluation-harness/exp/dapd/: DAPD lm-eval scripts.
evaluation/lm-evaluation-harness/exp/baselines/: KLASS, Fast-dLLM, and EB-Sampler lm-eval scripts.
evaluation/ParallelBench/exp/dapd/: DAPD ParallelBench runner.
evaluation/ParallelBench/exp/baselines/: baseline ParallelBench runner.

Repository Structure

.
|-- dapd/
|   |-- core.py                 # DAPD dependency scoring and token selection
|   |-- generation.py           # LLaDA generation with DAPD
|   |-- dream_core.py           # Dream-specific DAPD utilities
|   |-- dream_generation.py     # Dream generation with DAPD
|   |-- latency.py              # step / NFE accounting
|   `-- test.py                 # minimal generation smoke test
|-- baselines/
|   |-- EB/                     # EB-Sampler implementation
|   |-- Fast-dLLM/              # Fast-dLLM implementation
|   `-- KLASS/                  # KLASS implementation
|-- evaluation/
|   |-- lm-evaluation-harness/
|   |   |-- exp/dapd/           # DAPD lm-eval scripts
|   |   |-- exp/baselines/      # baseline lm-eval scripts
|   |   |-- exp/update_summary_with_metrics.py
|   |   `-- lm_eval/            # lm-eval tasks and model wrappers
|   `-- ParallelBench/
|       |-- exp/dapd/           # DAPD ParallelBench runner
|       |-- exp/baselines/      # baseline ParallelBench runner
|       |-- cfg/                # ParallelBench task configs
|       |-- dataset/            # ParallelBench datasets
|       |-- model/              # ParallelBench model wrappers
|       `-- utils/              # ParallelBench utilities
|-- env.yml                     # recommended conda environment
|-- LICENSE
`-- README.md

Generated directories such as logs/, results/, .cache/, and worktrees/ are not required for normal use or release.

DAPD Algorithm

The public implementation exposes two paper-facing modes:

dapd_staged: staged high-confidence unmasking.
dapd_direct: direct confidence-1.0 independent unmasking.

Quick Test

The smoke test loads a LLaDA model, runs one prompt through the DAPD generation path, and prints only the generated text plus steps in the stats block.

python dapd/test.py \
  --model GSAI-ML/LLaDA-8B-Instruct \
  --prompt "Explain what a Markov Random Field is." \
  --gen-length 256 \
  --alg dapd_direct \
  --tau-min 0.01 \
  --tau-max 0.05

lm-eval: DAPD

Use the task wrappers under evaluation/lm-evaluation-harness/exp/dapd/.

cd evaluation/lm-evaluation-harness

TAU_MIN=0.01 TAU_MAX=0.05 DAPD_ALG=dapd_direct \
  exp/dapd/llada/humaneval.sh

LLaDA 4-block example:

BLOCK_LENGTH=64 TAU_MIN=0.01 TAU_MAX=0.05 DAPD_ALG=dapd_direct \
  exp/dapd/llada/humaneval.sh

Dream example:

TAU_MIN=0.005 TAU_MAX=0.01 DAPD_ALG=dapd_direct \
  exp/dapd/dream/humaneval.sh

lm-eval: Baselines

cd evaluation/lm-evaluation-harness/exp/baselines

./run_eval.sh fast-dllm humaneval
./run_eval.sh klass mbpp
./run_eval.sh eb math500
./run_eval.sh dream-eb ifeval

Baseline names: fast-dllm, klass, eb, dream-fast-dllm, dream-klass, dream-eb.

ParallelBench

DAPD:

python evaluation/ParallelBench/exp/dapd/run_all_parallelbench_dapd.py \
  --tasks waiting_line_n15/copy,puzzle/latin_square_n4 \
  --alg dapd_staged \
  --tau-min 0.01 \
  --tau-max 0.15 \
  --no-wandb

Use --tasks paper for the paper subset, --tasks all for every local ParallelBench task, or --task-type <prefix> to filter by task family.

Baselines:

python evaluation/ParallelBench/exp/baselines/run_all_parallelbench_baselines.py \
  --baseline klass \
  --tasks puzzle/latin_square_n4 \
  --no-wandb

Citation

@article{kim2026dependency,
  title={Dependency-aware parallel decoding via attention for diffusion llms},
  author={Kim, Bumjun and Jeon, Dongjae and Jeon, Moongyu and No, Albert},
  journal={arXiv preprint arXiv:2603.12996},
  year={2026}
}

License

This project is released under the MIT License. See LICENSE for details. Third-party components under baselines/ and evaluation/lm-evaluation-harness/ retain their own licenses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DAPD : Dependency-Aware Parallel Decoding via Attention for Diffusion LLMs

Introduction

Performance

LLaDA

Dream

What Is Included

Repository Structure

DAPD Algorithm

Quick Test

lm-eval: DAPD

lm-eval: Baselines

ParallelBench

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
baselines		baselines
dapd		dapd
evaluation		evaluation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.yml		env.yml

Folders and files

Latest commit

History

Repository files navigation

DAPD : Dependency-Aware Parallel Decoding via Attention for Diffusion LLMs

Introduction

Performance

LLaDA

Dream

What Is Included

Repository Structure

DAPD Algorithm

Quick Test

lm-eval: DAPD

lm-eval: Baselines

ParallelBench

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages