ComfyUI-PiD

Compact ComfyUI nodes for NVIDIA PiD / PixelDiT using ComfyUI-native Comfy-Org/PixelDiT model loading.

PiD is a latent-conditioned pixel diffusion decoder/upscaler:

LATENT + caption + sigma -> PiD -> IMAGE

Install

cd ComfyUI/custom_nodes
git clone https://github.com/Merserk/ComfyUI-PiD.git
cd ComfyUI-PiD
python -m pip install -r requirements.txt

Restart ComfyUI.

Requirements: recent ComfyUI with native PixelDiT/PiD support, Python >=3.10, NVIDIA CUDA GPU recommended.

Models

Most nodes can download required files automatically when auto_download=true.

Use	Source	Local folder
PiD diffusion + Gemma text encoder	`Comfy-Org/PixelDiT`	`ComfyUI/models/diffusion_models/nvidia_pid/` and `ComfyUI/models/text_encoders/nvidia_pid/`
Caption Creator	`Qwen/Qwen3.5-0.8B`	`ComfyUI/models/text_encoders/nvidia_pid/qwen35_caption/`
Upscale VAEs	Flux/Z-Image, Flux2, SD3 VAE files	`ComfyUI/models/vae/nvidia_pid/`

Use model_precision=bf16 for best quality. fp8 is available only for Flux1-family 2k/2kto4k and Flux2-family 2k; Flux2 2kto4k, SD3, SDXL, and Qwen-Image must use bf16.

Nodes

Node	Output	Purpose
PiD Decode	`IMAGE`	One-node PiD decode from latent + caption + sigma.
PiD Text Prompt	`text`, `caption`	One prompt for normal text encoding and PiD caption input.
PiD Caption Creator	`text`, `caption`	Creates a caption from an input image with local Qwen.
PiD Empty Latent Image	`LATENT`	Backbone-aware empty latent with correct channels/downscale.
PiD KSampler Capture	`final_latent`, `pid_latent`, `pid_sigma`	KSampler-compatible sampler that captures the PiD latent and sigma.
PiD Prepare	`PID_PREP`	Moves/validates latent data and resolves PiD model assets.
PiD Sample	`PID_SAMPLES`	Runs native PiD sampling.
PiD Finalize	`IMAGE`	Converts PiD samples to a ComfyUI image.
PiD Upscale	`IMAGE`	Image-only tiled PiD upscaler with `2x/4x/6x/8x` output.

Recommended PiD sampling: pid_steps=4, cfg_scale=1.0, scale=0 or 4.

Supported Backbones

Backbone value	PiD family	Checkpoints	Latent	PiD Upscale
`zimage`	Flux1	`2k`, `2kto4k`	16ch / 8x	yes
`zimage-turbo`	Flux1	`2k`, `2kto4k`	16ch / 8x	yes
`flux`	Flux1	`2k`, `2kto4k`	16ch / 8x	yes
`flux2`	Flux2	`2k`, `2kto4k`	128ch / 16x	yes
`flux2-klein-4b`	Flux2	`2k`, `2kto4k`	128ch / 16x	yes
`flux2-klein-9b`	Flux2	`2k`, `2kto4k`	128ch / 16x	yes
`sd3`	SD3	`2k`, `2kto4k`	16ch / 8x	yes
`sdxl`	SDXL	`2kto4k` only	4ch / 8x	no
`qwenimage`	Qwen-Image	`2kto4k` only	16ch / 8x	no
`qwenimage-2512`	Qwen-Image	`2kto4k` only	16ch / 8x	no

dinov2 and siglip are not supported by the native Comfy-Org PiD model set.

Output Size Guide

Released PiD checkpoints use native 4x scale.

`pid_ckpt_type`	Base latent/image size	Final PiD output	Valid base presets
`2k`	512-class	base × 4, e.g. `512x512 -> 2048x2048`	`512x512`, `576x432`, `432x576`, `624x416`, `416x624`, `672x384`, `384x672`, `784x336`, `336x784`
`2kto4k`	1024-class	base × 4, e.g. `1024x1024 -> 4096x4096`	`1024x1024`, `1024x768`, `768x1024`, `1008x672`, `672x1008`, `1024x576`, `576x1024`, `1008x432`, `432x1008`

Latent size depends on backbone downscale. Example: Flux2 1024x1024 uses a 128 × 64 × 64 latent.

PiD Upscale

PiD Upscale accepts IMAGE and returns IMAGE. It is separate from latent decode: the node cuts the image into tiles, encodes each tile with the matching VAE, runs native 4-step PiD, blends tiles, then resizes to the selected final factor.

Setting	Values / behavior
`pid_ckpt_type`	`2k` uses 512px tiles; `2kto4k` uses 1024px tiles.
`backbone`	`zimage`, `zimage-turbo`, `flux`, `flux2`, `flux2-klein-4b`, `flux2-klein-9b`, `sd3`.
`model_precision`	Same limits as PiD decode; use `bf16` for best quality.
`upscale_factor`	Final output size: `2x`, `4x`, `6x`, or `8x`.
`strength`	PiD detail regeneration sigma, `0.0` to `1.0`; default `0.4`.
`caption`	Optional string input; connect `PiD Caption Creator` or `PiD Text Prompt`.

Profile	Tile size	Overlap	Small-image prepass
`2k`	512	64	Resize long edge to 512, PiD once, then tiled upscale.
`2kto4k`	1024	128	Resize long edge to 1024, PiD once, then tiled upscale.

Upscale VAEs are required because image tiles must be encoded into each backbone latent format:

Backbone family	Accepted VAE names
Flux1 / Z-Image	`ae.safetensors`
Flux2 / Flux2-Klein	`flux2_ae.safetensors`, `flux2-vae.safetensors`
SD3	`sd3_vae.safetensors`, `diffusion_pytorch_model.safetensors`

Final upscale size is always based on the original input image: width × factor, height × factor. SDXL and Qwen-Image are not available in PiD Upscale because this implementation only maps image VAEs for Flux1/Z-Image, Flux2/Flux2-Klein, and SD3.

Recommended Capture Settings

Backbone	LDM steps	Capture step	Sampler / scheduler
`flux`, `sd3`	28	24	`euler` / `flowmatch_euler_discrete`
`sdxl`	30	26	`euler` / `normal`
`flux2`	50	46	`euler` / `flowmatch_euler_discrete`
`flux2-klein-4b`, `flux2-klein-9b`	4	4	`euler` / `flowmatch_euler_discrete`
`qwenimage`, `qwenimage-2512`	50	44	`euler` / `flowmatch_euler_discrete`
`zimage`	50	46	`euler` / `flowmatch_euler_discrete`, `flowmatch_shift=3.0`
`zimage-turbo`	9	9	`euler` / `flowmatch_euler_discrete`, `flowmatch_shift=3.0`

Main Workflows

Text-to-image / generation

PiD Text Prompt -> normal text encode + PiD caption
PiD Empty Latent Image -> model sampler
PiD KSampler Capture pid_latent + pid_sigma -> PiD Prepare
PiD Prepare -> PiD Sample -> PiD Finalize -> Save Image

Direct decode

LATENT + caption + sigma -> PiD Decode -> Save Image

Image-to-image clean decode

Load Image -> Resize -> VAE Encode -> PiD Prepare -> PiD Sample -> PiD Finalize -> Save Image

Tiled upscale

Load Image -> PiD Caption Creator -> PiD Upscale -> Save Image

Example Workflows

Included in example_workflows/:

pid_flux_complete.json
pid_flux2_complete.json
pid_flux2_klein_4b_complete.json
pid_flux2_klein_9b_complete.json
pid_qwenimage_complete.json
pid_qwenimage_2512_complete.json
pid_sd3_complete.json
pid_sdxl_complete.json
pid_zimage_complete.json
pid_zimage_turbo_complete.json
pid_image_to_image_2k_complete.json
pid_image_to_image_2kto4k_complete.json
pid_upscale_complete.json

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.github/workflows		.github/workflows
example_workflows		example_workflows
web		web
.comfyignore		.comfyignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
nodes.py		nodes.py
pid_caption_creator.py		pid_caption_creator.py
pid_decode.py		pid_decode.py
pid_empty_latent.py		pid_empty_latent.py
pid_finalize.py		pid_finalize.py
pid_ksampler_capture.py		pid_ksampler_capture.py
pid_prepare.py		pid_prepare.py
pid_sample.py		pid_sample.py
pid_text_prompt.py		pid_text_prompt.py
pid_upscale.py		pid_upscale.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ComfyUI-PiD

Install

Models

Nodes

Supported Backbones

Output Size Guide

PiD Upscale

Recommended Capture Settings

Main Workflows

Text-to-image / generation

Direct decode

Image-to-image clean decode

Tiled upscale

Example Workflows

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ComfyUI-PiD

Install

Models

Nodes

Supported Backbones

Output Size Guide

PiD Upscale

Recommended Capture Settings

Main Workflows

Text-to-image / generation

Direct decode

Image-to-image clean decode

Tiled upscale

Example Workflows

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages