Bump llama.cpp to 2d97363 (b9870), release v0.8.32 by nyo16 · Pull Request #64 · nyo16/llama_cpp_ex

nyo16 · 2026-07-04T00:30:53Z

Summary

Updates the vendor/llama.cpp submodule from f708a5b2c (b9846) to 2d973636e (b9870) — 24 upstream commits — and releases v0.8.32.

API compatibility

No NIF changes were required. ggml/include/ggml.h, ggml/include/ggml-backend.h, common/common.h, common/chat.h,
common/json-schema-to-grammar.h, common/sampling.h, and common/speculative.h are all unchanged. The only touched header the binding compiles
against is include/llama.h, and its diff is purely additive: a new llama_ftype_name() helper and a new llama_model_ftype() getter (#25134) —
the binding calls neither.

Notable upstream changes

llama API: add llama_model_ftype() / llama_ftype_name() for reading a model's quantization type (#25134)
model: register t_layer_inp for qwen3next (#25141)
chat: trim messages sent to the StepFun parser, fixing long reasoning loops (#25238)
spec/dflash: support spec-draft-p-min in DFlash (#25246)
CUDA: topk-moe fusion for 288 experts (#25267); remove redundant copies after gated_delta_net (#23940); __restrict__ + PDL for
FlashAttention (#25185); fix KQ mask stride truncation/overflow in flash_attn_mask_to_KV_max (#24945); fix get_rows_back for >65535 rows
(#25103); fix Gemma E4B MTP FlashAttention (#25148)
ggml-cpu: AVX2 optimization for the nvfp4 dot product using a UE4M3 LUT (#23961)
opencl: precompiled binary kernel loading (#23042); initial q1_0 support (#25160)
hexagon: flash-attention rework (#25085)
common/server: HF primary split as model path (#25194); bracketed IPv6 literals in URL authorities (#25140); SSE keepalive pings (#25241);
cpp-httplib 0.49.0 (#25218)

Verification

✅ Clean rebuild from source (mix clean && mix compile)
✅ Full test suite: 216 passed, 1 skipped (generation: Qwen3.5-0.8B, embeddings: Qwen3-Embedding-0.6B)
✅ All 7 end-to-end smoke tests pass (generation, streaming, chat templates, JSON-schema grammar, raw GBNF, embeddings)
✅ mix format --check-formatted clean
✅ Dialyzer: 0 errors

Bump llama.cpp to 2d97363 (b9870), release v0.8.32

2fa0ae0

nyo16 merged commit 4746e98 into master Jul 4, 2026
4 checks passed

nyo16 deleted the bump-llama-cpp-b9870 branch July 4, 2026 00:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump llama.cpp to 2d97363 (b9870), release v0.8.32#64

Bump llama.cpp to 2d97363 (b9870), release v0.8.32#64
nyo16 merged 1 commit into
masterfrom
bump-llama-cpp-b9870

nyo16 commented Jul 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nyo16 commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

API compatibility

Notable upstream changes

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nyo16 commented Jul 4, 2026 •

edited

Loading