Bump llama.cpp to f708a5b (b9846), release v0.8.31#63
Merged
Conversation
Updated vendor/llama.cpp from 9bebfcb (b9826) to f708a5b (b9846), 20 commits. No NIF changes required — only common/common.h changed among the headers the binding compiles against, and its diff (internal COM_* logging macros, a new COMMON_SPECULATIVE_TYPE_DRAFT_DFLASH enum value, and the matching need_n_rs_seq() reservation) does not touch any symbol the binding consumes. Verified: full suite 158 tests + 4 skipped, all 7 smoke tests pass (gen against Qwen3.5-0.8B, embeddings against Qwen3-Embedding-0.6B), mix format clean, Dialyzer 0 errors.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Updates the
vendor/llama.cppsubmodule from 9bebfcb (b9826) to f708a5b (b9846) — 20 commits — and cuts release v0.8.31.No NIF changes required. Among the headers the binding compiles against, only
common/common.hchanged, and its diff does not reach anything the binding consumes:COM_*logging macros (COM_DBG/COM_TRC/COM_INF/COM_WRN/COM_ERR/COM_CNT);COMMON_SPECULATIVE_TYPE_DRAFT_DFLASHvalue appended to thecommon_speculative_typeenum (the DFlash speculative-decoding work);common_params_speculative::need_n_rs_seq()extended to reserve a recurrent-state seq for that new DFlash draft type.The binding only constructs
common_params_speculativefor its MTP path (types/draft.*) and otherwise callscommon_chat_templates_*,common_context_can_seq_rm,common_batch_add, andjson_schema_to_grammar— all unchanged.include/llama.h,ggml/include/ggml.h,ggml/include/ggml-backend.h,common/chat.h,common/json-schema-to-grammar.h,common/sampling.h, andcommon/speculative.hare untouched.Notable upstream changes
cudaMemcpy2DAsyncfast path inggml_cuda_cpy(#25057); hipBLAS dense prefill on gfx900 (#24588)Verification
LLAMA_BACKEND=auto, Metal)mix format --check-formattedcleanchecksum.exsis intentionally untouched — CI regenerates it after the release tag.