Skip to content

quasar529/DAPD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DAPD : Dependency-Aware Parallel Decoding via Attention for Diffusion LLMs

Introduction

Parallel decoding for Diffusion LLMs (dLLMs) is difficult because each denoising step provides only token-wise marginal distributions, while unmasking multiple tokens simultaneously requires accounting for inter-token dependencies. We propose Dependency-Aware Parallel Decoding (DAPD), a simple, training-free decoding method that uses self-attention to induce a conditional dependency graph over masked tokens. At each iteration, edges in this graph capture strong token interactions, while non-edges indicate weak dependence. Parallel decoding is then reduced to selecting an independent set on the graph and unmasking the selected tokens in parallel. This avoids co-updating strongly coupled tokens without auxiliary models or retraining. Experiments on LLaDA and Dream show that DAPD improves the accuracy–steps trade-off over existing methods and enables more globally distributed parallel updates that better exploit the any-order generation capability of dLLMs.

Performance

Each cell reports Acc (%) / Steps (NFE). DAPD rows are results with torch 2.5.1+cu121.

LLaDA

DAPD 1-block reduces the average decoding steps from 256 to 33.8 steps for Direct (🔥 7.6x fewer steps) and 48.1 steps for Staged (🔥 5.3x fewer steps).

Method HumanEval MBPP GSM8K Math500 IFEval
DAPD-Direct 1-block 34.2 / 23.5 36.0 / 22.5 71.4 / 34.0 27.6 / 46.5 57.3 / 42.6
DAPD-Direct 4-block 42.7 / 59.5 38.8 / 36.7 75.8 / 61.4 26.2 / 84.6 58.6 / 94.6
DAPD-Staged 1-block 36.6 / 45.2 40.4 / 37.6 71.1 / 48.9 28.4 / 58.9 62.0 / 50.1
DAPD-Staged 4-block 37.8 / 92.8 38.8 / 96.8 74.6 / 111.0 27.8 / 123.3 58.0 / 99.7
Fast-dLLM 1-block 10.4 / 40.7 9.6 / 34.2 7.5 / 89.4 1.8 / 76.4 41.7 / 31.7
EB-Sampler 1-block 13.4 / 85.9 8.0 / 61.1 6.6 / 143.4 2.0 / 136.3 30.7 / 108.3
KLASS 1-block 11.0 / 97.3 19.8 / 43.3 26.3 / 72.6 3.0 / 83.4 40.1 / 96.5
Fast-dLLM 4-block 37.2 / 92.1 20.6 / 41.5 76.8 / 72.8 28.0 / 95.6 58.3 / 100.0
EB-Sampler 4-block 37.2 / 110.4 19.4 / 52.6 76.1 / 86.9 28.2 / 113.2 57.0 / 136.3
KLASS 4-block 37.8 / 149.4 26.0 / 53.9 75.6 / 93.9 26.4 / 118.6 58.4 / 139.0

Dream

Method HumanEval Instruct MBPP GSM8K Math500 IFEval
DAPD-Direct 50.6 / 116.0 49.4 / 26.8 58.8 / 60.0 30.6 / 63.6 37.2 / 17.4
DAPD-Staged 42.7 / 110.6 49.4 / 48.7 52.6 / 66.8 26.6 / 60.6 35.4 / 83.0
Fast-dLLM 43.3 / 112.2 30.6 / 67.7 47.7 / 90.2 17.5 / 158.7 18.2 / 53.2
EB-Sampler 45.7 / 155.4 30.6 / 186.5 44.5 / 127.6 13.0 / 190.5 7.1 / 115.2
KLASS 59.8 / 133.3 34.4 / 60.8 45.1 / 154.4 13.0 / 204.2 7.1 / 132.1

What Is Included

  • dapd/: core DAPD implementation and a minimal generation test.
  • baselines/: vendored KLASS, Fast-dLLM, and EB code required by wrappers.
  • evaluation/lm-evaluation-harness/exp/dapd/: DAPD lm-eval scripts.
  • evaluation/lm-evaluation-harness/exp/baselines/: KLASS, Fast-dLLM, and EB-Sampler lm-eval scripts.
  • evaluation/ParallelBench/exp/dapd/: DAPD ParallelBench runner.
  • evaluation/ParallelBench/exp/baselines/: baseline ParallelBench runner.

Repository Structure

.
|-- dapd/
|   |-- core.py                 # DAPD dependency scoring and token selection
|   |-- generation.py           # LLaDA generation with DAPD
|   |-- dream_core.py           # Dream-specific DAPD utilities
|   |-- dream_generation.py     # Dream generation with DAPD
|   |-- latency.py              # step / NFE accounting
|   `-- test.py                 # minimal generation smoke test
|-- baselines/
|   |-- EB/                     # EB-Sampler implementation
|   |-- Fast-dLLM/              # Fast-dLLM implementation
|   `-- KLASS/                  # KLASS implementation
|-- evaluation/
|   |-- lm-evaluation-harness/
|   |   |-- exp/dapd/           # DAPD lm-eval scripts
|   |   |-- exp/baselines/      # baseline lm-eval scripts
|   |   |-- exp/update_summary_with_metrics.py
|   |   `-- lm_eval/            # lm-eval tasks and model wrappers
|   `-- ParallelBench/
|       |-- exp/dapd/           # DAPD ParallelBench runner
|       |-- exp/baselines/      # baseline ParallelBench runner
|       |-- cfg/                # ParallelBench task configs
|       |-- dataset/            # ParallelBench datasets
|       |-- model/              # ParallelBench model wrappers
|       `-- utils/              # ParallelBench utilities
|-- env.yml                     # recommended conda environment
|-- LICENSE
`-- README.md

Generated directories such as logs/, results/, .cache/, and worktrees/ are not required for normal use or release.

DAPD Algorithm

The public implementation exposes two paper-facing modes:

  • dapd_staged: staged high-confidence unmasking.
  • dapd_direct: direct confidence-1.0 independent unmasking.

Quick Test

The smoke test loads a LLaDA model, runs one prompt through the DAPD generation path, and prints only the generated text plus steps in the stats block.

python dapd/test.py \
  --model GSAI-ML/LLaDA-8B-Instruct \
  --prompt "Explain what a Markov Random Field is." \
  --gen-length 256 \
  --alg dapd_direct \
  --tau-min 0.01 \
  --tau-max 0.05

lm-eval: DAPD

Use the task wrappers under evaluation/lm-evaluation-harness/exp/dapd/.

cd evaluation/lm-evaluation-harness

TAU_MIN=0.01 TAU_MAX=0.05 DAPD_ALG=dapd_direct \
  exp/dapd/llada/humaneval.sh

LLaDA 4-block example:

BLOCK_LENGTH=64 TAU_MIN=0.01 TAU_MAX=0.05 DAPD_ALG=dapd_direct \
  exp/dapd/llada/humaneval.sh

Dream example:

TAU_MIN=0.005 TAU_MAX=0.01 DAPD_ALG=dapd_direct \
  exp/dapd/dream/humaneval.sh

lm-eval: Baselines

cd evaluation/lm-evaluation-harness/exp/baselines

./run_eval.sh fast-dllm humaneval
./run_eval.sh klass mbpp
./run_eval.sh eb math500
./run_eval.sh dream-eb ifeval

Baseline names: fast-dllm, klass, eb, dream-fast-dllm, dream-klass, dream-eb.

ParallelBench

DAPD:

python evaluation/ParallelBench/exp/dapd/run_all_parallelbench_dapd.py \
  --tasks waiting_line_n15/copy,puzzle/latin_square_n4 \
  --alg dapd_staged \
  --tau-min 0.01 \
  --tau-max 0.15 \
  --no-wandb

Use --tasks paper for the paper subset, --tasks all for every local ParallelBench task, or --task-type <prefix> to filter by task family.

Baselines:

python evaluation/ParallelBench/exp/baselines/run_all_parallelbench_baselines.py \
  --baseline klass \
  --tasks puzzle/latin_square_n4 \
  --no-wandb

Citation

@article{kim2026dependency,
  title={Dependency-aware parallel decoding via attention for diffusion llms},
  author={Kim, Bumjun and Jeon, Dongjae and Jeon, Moongyu and No, Albert},
  journal={arXiv preprint arXiv:2603.12996},
  year={2026}
}

License

This project is released under the MIT License. See LICENSE for details. Third-party components under baselines/ and evaluation/lm-evaluation-harness/ retain their own licenses.

About

Official implementation of "DAPD: Dependency-Aware Parallel Decoding via Attention for Diffusion LLMs (ICML 2026)"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors