Skip to content

Add Ideogram support and improve BF16 dequantization handling#459

Open
molbal wants to merge 6 commits into
city96:mainfrom
molbal:main
Open

Add Ideogram support and improve BF16 dequantization handling#459
molbal wants to merge 6 commits into
city96:mainfrom
molbal:main

Conversation

@molbal

@molbal molbal commented Jun 9, 2026

Copy link
Copy Markdown

Summary

This adds support for Ideogram GGUF models.

What Changed

  • Added ideogram to the supported image GGUF architectures.
  • Added Ideogram model detection to the converter.
  • Added GGUF dtype handling needed by Ideogram inference.
  • Fixed the Ideogram inference failure where a packed GGUF weight dtype caused a byte tensor to reach CUDA linear.
  • Adjusted BF16 GGUF loading so Ideogram can start inference faster.

Notes

Tested on Windows 11, Python version: 3.12.11 (main, Jul 23 2025, 00:32:20) [MSC v.1944 64 bit (AMD64)] [INFO] Total VRAM 8192 MB, total RAM 48394 MB
[INFO] pytorch version: 2.12.0+cu130
[INFO] Set vram state to: LOW_VRAM
[INFO] Device: cuda:0 NVIDIA GeForce RTX 3080 Laptop GPU

Tested with Q4_0 gguf from https://huggingface.co/leejet/ideogram-4-GGUF

Other GGUF quant types still use the existing dequant paths.

@yu234567

Copy link
Copy Markdown

Great! I successfully ran it, but my device doesn't support bf16; it gets converted to fp32 computation, which makes it very slow. Can you make it run on my device in fp16?


[INFO] got prompt
[INFO] Using xformers attention in VAE
[INFO] Using xformers attention in VAE
[INFO] VAE load device: cuda:0, offload device: cpu, dtype: torch.float32
[INFO] Found quantization metadata version 1
[INFO] Using MixedPrecisionOps for text encoder
[INFO] CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
[INFO] Requested to load Ideogram4TEModel_
[INFO] Model Ideogram4TEModel_ prepared for dynamic VRAM loading. 4319MB Staged. 0 patches attached. Force pre-loaded 144 weights: 594 KB.
[WARNING] Warning: This gguf model file is loaded in compatibility mode 'sd.cpp' [arch:ideogram]
[INFO] gguf qtypes: BF16 (254), Q4_0 (204)
[INFO] model weight dtype torch.bfloat16, manual cast: torch.float32
[INFO] model_type FLOW
[INFO] Requested to load Ideogram4
[INFO] loaded completely; 7997.15 MB usable, 5506.41 MB loaded, full load: True
8%|████ | 1/12 [00:16<03:03, 16.65s/it, Model Initialization complete! ][INFO] Interrupting prompt 2d4b1e4f-b3b1-4a31-9d36-604af4910de5

@molbal

molbal commented Jun 11, 2026

Copy link
Copy Markdown
Author

Hi @yu234567 - try now. It should work better now, can you verify please?

@yu234567

Copy link
Copy Markdown

Hi @yu234567 - try now. It should work better now, can you verify please?

Thank you so much, it worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants