Parallel decoding for Diffusion LLMs (dLLMs) is difficult because each denoising step provides only token-wise marginal distributions, while unmasking multiple tokens simultaneously requires accounting for inter-token dependencies. We propose Dependency-Aware Parallel Decoding (DAPD), a simple, training-free decoding method that uses self-attention to induce a conditional dependency graph over masked tokens. At each iteration, edges in this graph capture strong token interactions, while non-edges indicate weak dependence. Parallel decoding is then reduced to selecting an independent set on the graph and unmasking the selected tokens in parallel. This avoids co-updating strongly coupled tokens without auxiliary models or retraining. Experiments on LLaDA and Dream show that DAPD improves the accuracy–steps trade-off over existing methods and enables more globally distributed parallel updates that better exploit the any-order generation capability of dLLMs.
Each cell reports Acc (%) / Steps (NFE). DAPD rows are results with torch
2.5.1+cu121.
DAPD 1-block reduces the average decoding steps from 256 to 33.8 steps for Direct (🔥 7.6x fewer steps) and 48.1 steps for Staged (🔥 5.3x fewer steps).
| Method | HumanEval | MBPP | GSM8K | Math500 | IFEval |
|---|---|---|---|---|---|
| DAPD-Direct 1-block | 34.2 / 23.5 | 36.0 / 22.5 | 71.4 / 34.0 | 27.6 / 46.5 | 57.3 / 42.6 |
| DAPD-Direct 4-block | 42.7 / 59.5 | 38.8 / 36.7 | 75.8 / 61.4 | 26.2 / 84.6 | 58.6 / 94.6 |
| DAPD-Staged 1-block | 36.6 / 45.2 | 40.4 / 37.6 | 71.1 / 48.9 | 28.4 / 58.9 | 62.0 / 50.1 |
| DAPD-Staged 4-block | 37.8 / 92.8 | 38.8 / 96.8 | 74.6 / 111.0 | 27.8 / 123.3 | 58.0 / 99.7 |
| Fast-dLLM 1-block | 10.4 / 40.7 | 9.6 / 34.2 | 7.5 / 89.4 | 1.8 / 76.4 | 41.7 / 31.7 |
| EB-Sampler 1-block | 13.4 / 85.9 | 8.0 / 61.1 | 6.6 / 143.4 | 2.0 / 136.3 | 30.7 / 108.3 |
| KLASS 1-block | 11.0 / 97.3 | 19.8 / 43.3 | 26.3 / 72.6 | 3.0 / 83.4 | 40.1 / 96.5 |
| Fast-dLLM 4-block | 37.2 / 92.1 | 20.6 / 41.5 | 76.8 / 72.8 | 28.0 / 95.6 | 58.3 / 100.0 |
| EB-Sampler 4-block | 37.2 / 110.4 | 19.4 / 52.6 | 76.1 / 86.9 | 28.2 / 113.2 | 57.0 / 136.3 |
| KLASS 4-block | 37.8 / 149.4 | 26.0 / 53.9 | 75.6 / 93.9 | 26.4 / 118.6 | 58.4 / 139.0 |
| Method | HumanEval Instruct | MBPP | GSM8K | Math500 | IFEval |
|---|---|---|---|---|---|
| DAPD-Direct | 50.6 / 116.0 | 49.4 / 26.8 | 58.8 / 60.0 | 30.6 / 63.6 | 37.2 / 17.4 |
| DAPD-Staged | 42.7 / 110.6 | 49.4 / 48.7 | 52.6 / 66.8 | 26.6 / 60.6 | 35.4 / 83.0 |
| Fast-dLLM | 43.3 / 112.2 | 30.6 / 67.7 | 47.7 / 90.2 | 17.5 / 158.7 | 18.2 / 53.2 |
| EB-Sampler | 45.7 / 155.4 | 30.6 / 186.5 | 44.5 / 127.6 | 13.0 / 190.5 | 7.1 / 115.2 |
| KLASS | 59.8 / 133.3 | 34.4 / 60.8 | 45.1 / 154.4 | 13.0 / 204.2 | 7.1 / 132.1 |
dapd/: core DAPD implementation and a minimal generation test.baselines/: vendored KLASS, Fast-dLLM, and EB code required by wrappers.evaluation/lm-evaluation-harness/exp/dapd/: DAPD lm-eval scripts.evaluation/lm-evaluation-harness/exp/baselines/: KLASS, Fast-dLLM, and EB-Sampler lm-eval scripts.evaluation/ParallelBench/exp/dapd/: DAPD ParallelBench runner.evaluation/ParallelBench/exp/baselines/: baseline ParallelBench runner.
.
|-- dapd/
| |-- core.py # DAPD dependency scoring and token selection
| |-- generation.py # LLaDA generation with DAPD
| |-- dream_core.py # Dream-specific DAPD utilities
| |-- dream_generation.py # Dream generation with DAPD
| |-- latency.py # step / NFE accounting
| `-- test.py # minimal generation smoke test
|-- baselines/
| |-- EB/ # EB-Sampler implementation
| |-- Fast-dLLM/ # Fast-dLLM implementation
| `-- KLASS/ # KLASS implementation
|-- evaluation/
| |-- lm-evaluation-harness/
| | |-- exp/dapd/ # DAPD lm-eval scripts
| | |-- exp/baselines/ # baseline lm-eval scripts
| | |-- exp/update_summary_with_metrics.py
| | `-- lm_eval/ # lm-eval tasks and model wrappers
| `-- ParallelBench/
| |-- exp/dapd/ # DAPD ParallelBench runner
| |-- exp/baselines/ # baseline ParallelBench runner
| |-- cfg/ # ParallelBench task configs
| |-- dataset/ # ParallelBench datasets
| |-- model/ # ParallelBench model wrappers
| `-- utils/ # ParallelBench utilities
|-- env.yml # recommended conda environment
|-- LICENSE
`-- README.md
Generated directories such as logs/, results/, .cache/, and
worktrees/ are not required for normal use or release.
The public implementation exposes two paper-facing modes:
dapd_staged: staged high-confidence unmasking.dapd_direct: direct confidence-1.0 independent unmasking.
The smoke test loads a LLaDA model, runs one prompt through the DAPD generation
path, and prints only the generated text plus steps in the stats block.
python dapd/test.py \
--model GSAI-ML/LLaDA-8B-Instruct \
--prompt "Explain what a Markov Random Field is." \
--gen-length 256 \
--alg dapd_direct \
--tau-min 0.01 \
--tau-max 0.05Use the task wrappers under evaluation/lm-evaluation-harness/exp/dapd/.
cd evaluation/lm-evaluation-harness
TAU_MIN=0.01 TAU_MAX=0.05 DAPD_ALG=dapd_direct \
exp/dapd/llada/humaneval.shLLaDA 4-block example:
BLOCK_LENGTH=64 TAU_MIN=0.01 TAU_MAX=0.05 DAPD_ALG=dapd_direct \
exp/dapd/llada/humaneval.shDream example:
TAU_MIN=0.005 TAU_MAX=0.01 DAPD_ALG=dapd_direct \
exp/dapd/dream/humaneval.shcd evaluation/lm-evaluation-harness/exp/baselines
./run_eval.sh fast-dllm humaneval
./run_eval.sh klass mbpp
./run_eval.sh eb math500
./run_eval.sh dream-eb ifevalBaseline names: fast-dllm, klass, eb, dream-fast-dllm,
dream-klass, dream-eb.
DAPD:
python evaluation/ParallelBench/exp/dapd/run_all_parallelbench_dapd.py \
--tasks waiting_line_n15/copy,puzzle/latin_square_n4 \
--alg dapd_staged \
--tau-min 0.01 \
--tau-max 0.15 \
--no-wandbUse --tasks paper for the paper subset, --tasks all for every local
ParallelBench task, or --task-type <prefix> to filter by task family.
Baselines:
python evaluation/ParallelBench/exp/baselines/run_all_parallelbench_baselines.py \
--baseline klass \
--tasks puzzle/latin_square_n4 \
--no-wandb@article{kim2026dependency,
title={Dependency-aware parallel decoding via attention for diffusion llms},
author={Kim, Bumjun and Jeon, Dongjae and Jeon, Moongyu and No, Albert},
journal={arXiv preprint arXiv:2603.12996},
year={2026}
}This project is released under the MIT License. See LICENSE for
details. Third-party components under baselines/ and
evaluation/lm-evaluation-harness/ retain their own licenses.