⚡ DiT-IC: Aligned Diffusion Transformer for Efficient Image Compression

🔥 News

[2026/03/15] 🎉 Code and pre-trained models are officially released!
[2026/02/21] 🏆 DiT-IC is accepted by CVPR 2026!

📖 Introduction

DiT-IC is a high-performance neural image compression framework that leverages the power of Diffusion Transformers (DiT). By bridging latent diffusion models with standard entropy coding pipelines, DiT-IC achieves state-of-the-art perceptual quality with high efficiency.

✨ Key Features

Novel Architecture: The first Diffusion Transformer tailored for high-fidelity image reconstruction.
Aligned LoRA Adaptation: Efficient fine-tuning via proposed alignment mechanisms, significantly accelerating training process.
High Efficiency: 32x latent space diffusion ensures faster inference and lower memory consumption compared to other models.
Deploy-Ready: Fully compatible with standard entropy coding and easy-to-extend API and your own codecs.

📑 Table of Contents

Checkpoints & Performance
Installation
Quick Start: Inference
Quick Start: Training
More Features
BibTeX

📊 Checkpoints and Performance

We provide scalable model configurations by adjusting the VAE/DiT LoRA ranks.

We provide four operating points (quality=1,2,3,4, corresponding to λ ∈ [2⁴, 2⁻²]), which achieve approximate bitrates of 0.009, 0.044, 0.081, and 0.100 bpp on the Kodak dataset, respectively.

👉 Download Models: HuggingFace - DiT-IC Rank64/128 Checkpoints

VAE/Diff. Lora Rank	λ	📷 Kodak			🏆 CLIC				🌄 DIV2K
VAE/Diff. Lora Rank	λ	BPP ↓	LPIPS ↓	DISTS ↓	BPP ↓	LPIPS ↓	DISTS ↓	FID ↓	BPP ↓	LPIPS ↓	DISTS ↓	FID ↓
32/64	0.5	0.080	0.094	0.059	0.079	0.067	0.038	3.06	0.089	0.084	0.043	7.22
64/128	0.5	0.081	0.091	0.055	0.072	0.070	0.034	2.66	0.084	0.086	0.039	6.52

Detailed performance logs for the original results (rank 32/64) and extended results (rank 64/128) can be found in the results/ directory.

🔧 Installation

Requirements

Python = 3.12
PyTorch = 2.8
CompressAI == 1.2.8 (⚠️ Crucial for consistent bitrate/BPP calculation)

Other environments may also work, but they have not been tested.

Install dependencies

pip install -r requirements.txt

💻 Quick Start: Inference

1️⃣ Prepare Datasets

The following datasets are used for evaluation:

Dataset	Description
Kodak	24 natural images (768×512)
DIV2K Validation	100 high-resolution images
CLIC 2020 Test	428 high-resolution images

You may also evaluate the model on your own datasets.

2️⃣ Run Compression

Option A: Inference with Merged Weights (Recommended)

The released checkpoints merge LoRA weights into the base model, which simplifies deployment and speeds up inference.

CUDA_VISIBLE_DEVICES=0 python -u compress.py \
        --config_path="configs/inference_merge.yaml" \
        --codec_path="checkpoints/q3_merge_ema.pt" \
        --img_path="/data/data/Kodak" \
        --rec_path="results/Kodak/rec/" \
        --bin_path="results/Kodak/bin/" \
        --use_merge \
        2>&1 | tee results/logs/eval_ema_Kodak_q3_$(date +%Y%m%d_%H%M%S).log

Option B: Inference with Raw LoRA Checkpoints

If you trained the model yourself and the LoRA weights are still not merged, run:

CUDA_VISIBLE_DEVICES=0 python -u compress.py \
        --config_path="configs/inference.yaml" \
        --codec_path="checkpoints/0.5lrd_lora_0050000.pt" \
        --img_path="/data/data/Kodak" \
        --rec_path="results/Kodak/rec/" \
        --bin_path="results/Kodak/bin/" \
        2>&1 | tee logs/eval_kodak_0.5lrd_$(date +%Y%m%d_%H%M%S).log

Bitstreams will be saved in --bin_path, and reconstructed images in --rec_path.

Logs will be stored in the results/logs/.

Argument	Description
`--use_ema`	Use EMA weights
`--save_img`	Save reconstructed images
`--entropy_estimation`	Estimate bitrate without real entropy coding

3️⃣ Evaluation (Optional)

To compute image quality metrics using the saved reconstruction folders:

CUDA_VISIBLE_DEVICES=0 python -m eval.evaluate \
    --recon_dir "results/Kodak/rec/" \
    --gt_dir "/data/data/Kodak" \
    2>&1 | tee logs/eval_kodak_$(date +%Y%m%d_%H%M%S).log

🚗 Quick Start: Training

1️⃣ Prepare Training Datasets

Download the following datasets and place them in the corresponding directories:

Dataset	Size
LSDIR	~50K images
MLIC-Train-100K	~100K images

2️⃣ Train from Scratch

Download the required pretrained components:

DiT-SANA
Aux Encoder-ELIC.

Place them into SANA/ and ELIC/weights.

A simplified training pipeline is provided below.

Stage 1 — Train w/o GAN loss

CUDA_VISIBLE_DEVICES=0,1 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True PYTHONPATH=. torchrun --standalone --nproc_per_node=2 --nnodes=1 train_nogan_ddp.py --config configs/train_256_nogan.yaml

Stage 2 — Finetune w/ GAN loss

CUDA_VISIBLE_DEVICES=0,1 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True PYTHONPATH=. torchrun --standalone --nproc_per_node=2 --nnodes=1 train_ddp.py --config configs/train_256_gan.yaml

--nproc_per_node is equal to the number of your available GPUs. You can use single GPU training.

💡 The rate–distortion trade-off parameter λ can be either kept the same across stages or adjusted during Stage 2.

Examples:

Stage1 λ = 2.0 → Stage2 λ = 2.0
Stage1 λ = 0.5 → Stage2 λ = 2.0

💡 More advanced strategies, such as incorporating image–text embeddings (e.g., CLIP loss), better GAN models, or more refined training pipelines, may further improve performance or accelerate training. Users can modify the configuration files according to their own requirements and hardware setups.

3️⃣ Finetune Pretrained Model

If you start from our pretrained checkpoints:

👉 Download Models: HuggingFace - DiT-IC Rank64/128 Checkpoints

Then finetune at the target bitrate：

CUDA_VISIBLE_DEVICES=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True PYTHONPATH=. torchrun --standalone --nproc_per_node=1 --nnodes=1 train_ddp.py --config configs/train_merge_256_gan.yaml

4️⃣ Merge LoRA Weights

Training checkpoints contain:

trained LoRA weights
trained codec parameters

For faster inference, you may merge LoRA weights into the base model:

CUDA_VISIBLE_DEVICES=0 python merge.py \
        --config_path="configs/inference.yaml" \
        --codec_path="checkpoints/0.5lrd_lora_0050000.pt"

You can add --use_ema to enable EMA weights.

The merged checkpoint will be saved in the same --codec_path directory.

🧩 More Features

You can modify configuration files under configs/ to adapt the framework to different settings:

datasets scales
training images resolutions
hardware constraints (e.g., memory, FLOPs)

Planned future updates:

FP16 / BF16 inference
Tiled inference

📖 BibTeX

If you find this project useful, please cite:

@inproceedings{shi2026ditic,
  title={DiT-IC: Aligned Diffusion Transformer for Efficient Image Compression},
  author={Shi Junqi, Lu Ming, Li Xingchen, Ke Anle, Zhang Ruiqi and Ma Zhan},
  booktitle={CVPR},
  year={2026}
}

🥰 Acknowledgement

Thanks to the following open-sourced codebase for their wonderful work and codebase!

⭐️ If you find this project helpful, please give it a star! ⭐️

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ DiT-IC: Aligned Diffusion Transformer for Efficient Image Compression

🔥 News

📖 Introduction

✨ Key Features

📑 Table of Contents

📊 Checkpoints and Performance

🔧 Installation

Requirements

Install dependencies

💻 Quick Start: Inference

1️⃣ Prepare Datasets

2️⃣ Run Compression

Option A: Inference with Merged Weights (Recommended)

Option B: Inference with Raw LoRA Checkpoints

3️⃣ Evaluation (Optional)

🚗 Quick Start: Training

1️⃣ Prepare Training Datasets

2️⃣ Train from Scratch

Stage 1 — Train w/o GAN loss

Stage 2 — Finetune w/ GAN loss

3️⃣ Finetune Pretrained Model

4️⃣ Merge LoRA Weights

🧩 More Features

📖 BibTeX

🥰 Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
ELIC		ELIC
SANA		SANA
assets		assets
configs		configs
datasets		datasets
eval		eval
models		models
results		results
ReadMe.md		ReadMe.md
compress.py		compress.py
merge.py		merge.py
requirements.txt		requirements.txt
train_ddp.py		train_ddp.py
train_nogan_ddp.py		train_nogan_ddp.py

Folders and files

Latest commit

History

Repository files navigation

⚡ DiT-IC: Aligned Diffusion Transformer for Efficient Image Compression

🔥 News

📖 Introduction

✨ Key Features

📑 Table of Contents

📊 Checkpoints and Performance

🔧 Installation

Requirements

Install dependencies

💻 Quick Start: Inference

1️⃣ Prepare Datasets

2️⃣ Run Compression

Option A: Inference with Merged Weights (Recommended)

Option B: Inference with Raw LoRA Checkpoints

3️⃣ Evaluation (Optional)

🚗 Quick Start: Training

1️⃣ Prepare Training Datasets

2️⃣ Train from Scratch

Stage 1 — Train w/o GAN loss

Stage 2 — Finetune w/ GAN loss

3️⃣ Finetune Pretrained Model

4️⃣ Merge LoRA Weights

🧩 More Features

📖 BibTeX

🥰 Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages