-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Pull requests: huggingface/candle
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Support llama GGUF tokenizers without explicit merges
#3670
opened Jun 27, 2026 by
jesco-absolut
Loading…
Add CUDA Graph capture for decode + FlashInfer-style decode-attention backend (CUDA/Metal/CPU) (#3651)
#3669
opened Jun 27, 2026 by
astorise
Loading…
cpu-optimization: wire f16kv changes to Qwen3 (and remove feature toggles for PR 3664-3667 27% gain in prefill, 14% gain decode)
#3668
opened Jun 27, 2026 by
DrJesseGlass
Contributor
Loading…
cpu-optimiziation: aarch64 CPU prefill lane=row Q4_K kernel + a packed Q6_K dtype (20% prefill gain at 6 threads and 512 tokens)
#3667
opened Jun 27, 2026 by
DrJesseGlass
Contributor
Loading…
cpu-optimize: add f16 KV cache to the CPU flash-attention path (14% decode gain) vec approx expf (2% decode gain)
#3665
opened Jun 27, 2026 by
DrJesseGlass
Contributor
Loading…
cpu-optimize: Parallelize contiguous f32 elementwise ops over the barrier pool (prefill 8% gain)
#3664
opened Jun 27, 2026 by
DrJesseGlass
Contributor
Loading…
Add block-wise FP8 quantized linear layer support (#3650)
#3662
opened Jun 27, 2026 by
astorise
Loading…
5 tasks done
Add AWQ quantized linear layer support and unify with GPTQ (#3650)
#3661
opened Jun 27, 2026 by
astorise
Loading…
6 tasks done
Add GPTQ quantized linear layer support for Qwen2 (#3650)
#3660
opened Jun 27, 2026 by
astorise
Loading…
6 tasks done
Add device-agnostic (CPU/Metal) PagedAttention to complement the CUDA kernels (#3655)
#3657
opened Jun 25, 2026 by
astorise
Loading…
candle-onnx: add pipeline-parallel evaluation via simple_eval_with_placement
#3648
opened Jun 25, 2026 by
astorise
Loading…
Upstream onnx device candle-onnx: propagate device through simple_eval instead of hard-coding CPU
#3647
opened Jun 25, 2026 by
astorise
Loading…
Fix Metal device creation panic with bounds check (fixes #3566)
#3645
opened Jun 24, 2026 by
Olcmyk
Loading…
Load added tokens from GGUF metadata (fixes missing <think>/</think> on Qwen)
#3641
opened Jun 23, 2026 by
jaweed3
Loading…
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.