Bump llama.cpp to f708a5b (b9846), release v0.8.31 by nyo16 · Pull Request #63 · nyo16/llama_cpp_ex

nyo16 · 2026-06-30T13:15:48Z

Summary

Updates the vendor/llama.cpp submodule from 9bebfcb (b9826) to f708a5b (b9846) — 20 commits — and cuts release v0.8.31.

No NIF changes required. Among the headers the binding compiles against, only common/common.h changed, and its diff does not reach anything the binding consumes:

a block of internal COM_* logging macros (COM_DBG/COM_TRC/COM_INF/COM_WRN/COM_ERR/COM_CNT);
a new COMMON_SPECULATIVE_TYPE_DRAFT_DFLASH value appended to the common_speculative_type enum (the DFlash speculative-decoding work);
common_params_speculative::need_n_rs_seq() extended to reserve a recurrent-state seq for that new DFlash draft type.

The binding only constructs common_params_speculative for its MTP path (types/draft.*) and otherwise calls common_chat_templates_*, common_context_can_seq_rm, common_batch_add, and
json_schema_to_grammar — all unchanged. include/llama.h, ggml/include/ggml.h, ggml/include/ggml-backend.h, common/chat.h, common/json-schema-to-grammar.h, common/sampling.h, and
common/speculative.h are untouched.

Notable upstream changes

model: DeepSeek V4 (#24162); MiniCPM5 chat parser (#24889)
spec/dflash: DFlash speculative decoding (#22105) + draft-conversion refactor (#25110)
CUDA/HIP: cudaMemcpy2DAsync fast path in ggml_cuda_cpy (#25057); hipBLAS dense prefill on gfx900 (#24588)
vulkan: matmul bk-loop roll for Asahi Linux (#24663); flops-based submission heuristic (#25005)
opencl: flash-attention improvement (#25069)
ggml-webgpu: NVFP4 support (#25143)
sched: revert of #20793 (the split-compute sync change that landed in v0.8.30) (#25138)

Verification

Rebuilt the NIF from source (LLAMA_BACKEND=auto, Metal)
Full suite: 158 tests + 4 skipped, 0 failures
All 7 end-to-end smoke tests pass (generation, streaming, chat templates, JSON-schema grammar, raw GBNF, embeddings) — generation against Qwen3.5-0.8B, embeddings against Qwen3-Embedding-0.6B
mix format --check-formatted clean
Dialyzer: 0 errors

checksum.exs is intentionally untouched — CI regenerates it after the release tag.

Updated vendor/llama.cpp from 9bebfcb (b9826) to f708a5b (b9846), 20 commits. No NIF changes required — only common/common.h changed among the headers the binding compiles against, and its diff (internal COM_* logging macros, a new COMMON_SPECULATIVE_TYPE_DRAFT_DFLASH enum value, and the matching need_n_rs_seq() reservation) does not touch any symbol the binding consumes. Verified: full suite 158 tests + 4 skipped, all 7 smoke tests pass (gen against Qwen3.5-0.8B, embeddings against Qwen3-Embedding-0.6B), mix format clean, Dialyzer 0 errors.

nyo16 merged commit bafa53a into master Jun 30, 2026
4 checks passed

nyo16 deleted the bump-llama-cpp-f708a5b2c branch July 4, 2026 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump llama.cpp to f708a5b (b9846), release v0.8.31#63

Bump llama.cpp to f708a5b (b9846), release v0.8.31#63
nyo16 merged 1 commit into
masterfrom
bump-llama-cpp-f708a5b2c

nyo16 commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nyo16 commented Jun 30, 2026

Summary

Notable upstream changes

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant