Skip to content

Bump llama.cpp to f708a5b (b9846), release v0.8.31#63

Merged
nyo16 merged 1 commit into
masterfrom
bump-llama-cpp-f708a5b2c
Jun 30, 2026
Merged

Bump llama.cpp to f708a5b (b9846), release v0.8.31#63
nyo16 merged 1 commit into
masterfrom
bump-llama-cpp-f708a5b2c

Conversation

@nyo16

@nyo16 nyo16 commented Jun 30, 2026

Copy link
Copy Markdown
Owner

Summary

Updates the vendor/llama.cpp submodule from 9bebfcb (b9826) to f708a5b (b9846) — 20 commits — and cuts release v0.8.31.

No NIF changes required. Among the headers the binding compiles against, only common/common.h changed, and its diff does not reach anything the binding consumes:

  • a block of internal COM_* logging macros (COM_DBG/COM_TRC/COM_INF/COM_WRN/COM_ERR/COM_CNT);
  • a new COMMON_SPECULATIVE_TYPE_DRAFT_DFLASH value appended to the common_speculative_type enum (the DFlash speculative-decoding work);
  • common_params_speculative::need_n_rs_seq() extended to reserve a recurrent-state seq for that new DFlash draft type.

The binding only constructs common_params_speculative for its MTP path (types/draft.*) and otherwise calls common_chat_templates_*, common_context_can_seq_rm, common_batch_add, and
json_schema_to_grammar — all unchanged. include/llama.h, ggml/include/ggml.h, ggml/include/ggml-backend.h, common/chat.h, common/json-schema-to-grammar.h, common/sampling.h, and
common/speculative.h are untouched.

Notable upstream changes

  • model: DeepSeek V4 (#24162); MiniCPM5 chat parser (#24889)
  • spec/dflash: DFlash speculative decoding (#22105) + draft-conversion refactor (#25110)
  • CUDA/HIP: cudaMemcpy2DAsync fast path in ggml_cuda_cpy (#25057); hipBLAS dense prefill on gfx900 (#24588)
  • vulkan: matmul bk-loop roll for Asahi Linux (#24663); flops-based submission heuristic (#25005)
  • opencl: flash-attention improvement (#25069)
  • ggml-webgpu: NVFP4 support (#25143)
  • sched: revert of #20793 (the split-compute sync change that landed in v0.8.30) (#25138)

Verification

  • Rebuilt the NIF from source (LLAMA_BACKEND=auto, Metal)
  • Full suite: 158 tests + 4 skipped, 0 failures
  • All 7 end-to-end smoke tests pass (generation, streaming, chat templates, JSON-schema grammar, raw GBNF, embeddings) — generation against Qwen3.5-0.8B, embeddings against Qwen3-Embedding-0.6B
  • mix format --check-formatted clean
  • Dialyzer: 0 errors

checksum.exs is intentionally untouched — CI regenerates it after the release tag.

Updated vendor/llama.cpp from 9bebfcb (b9826) to f708a5b (b9846),
20 commits. No NIF changes required — only common/common.h changed
among the headers the binding compiles against, and its diff (internal
COM_* logging macros, a new COMMON_SPECULATIVE_TYPE_DRAFT_DFLASH enum
value, and the matching need_n_rs_seq() reservation) does not touch any
symbol the binding consumes.

Verified: full suite 158 tests + 4 skipped, all 7 smoke tests pass
(gen against Qwen3.5-0.8B, embeddings against Qwen3-Embedding-0.6B),
mix format clean, Dialyzer 0 errors.
@nyo16 nyo16 merged commit bafa53a into master Jun 30, 2026
4 checks passed
@nyo16 nyo16 deleted the bump-llama-cpp-f708a5b2c branch July 4, 2026 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant