fix(kv-cache): refresh cold anchor after partial prefix hits#394
Open
TerryChengTW wants to merge 1 commit into
Open
fix(kv-cache): refresh cold anchor after partial prefix hits#394TerryChengTW wants to merge 1 commit into
TerryChengTW wants to merge 1 commit into
Conversation
Cold checkpoints were only written on fully cold prefills (cached == 0). When the stable chat prefix changes (agent clients rewrite an early prompt block between sessions), fresh sessions partially hit a shorter continued waypoint, cached != 0 blocks the cold path, and the stale anchor is never replaced: every subsequent fresh session repays the anchor-minus-waypoint tokens forever. Extract the decision into ds4_kvstore_cold_store_target(): cold starts keep the original behavior, and a partial hit below the anchor now refreshes it (the lookup returning a shorter entry proves the stored anchor no longer matches this prompt). Hits at or past the anchor store nothing, so matching anchors are never rewritten redundantly. Also log a hint when the cold store is skipped because the first prompt exceeds --kv-cache-cold-max-tokens; previously the skip was invisible. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Author
|
Production verification complete (M2 Ultra 192 GiB / Metal, DeepSeek V4 Flash q2-q4-imatrix, Claude Code agent workload, ~30K-token first prompts). Scenario: a git commit rewrote the early prompt block (Claude Code's gitStatus), invalidating the stored anchor. First fresh session — partial hit below the anchor; the old build would have skipped the cold store here ( Second fresh session — hits the refreshed anchor, back on the fast path: Unpatched, the second session (and every one after it) stayed at ~30s. Anchor refresh cost during the first session: one 377 MiB checkpoint write, 75.9 ms. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #393. Also adds the skip hint proposed in #392 (the default-value question is left to that issue).
Cold checkpoints were only written on fully cold prefills (
cached == 0). When the stable chat prefix changes between sessions (agent clients rewrite an early prompt block — Claude Code does this on every git commit), fresh sessions partially hit a shortercontinuedwaypoint,cached != 0blocks the cold path, and the stale anchor is never replaced. Every subsequent fresh session repaysanchor - waypointtokens forever; we measured the cache permanently running at half its designed benefit (13.8s prefill with a fresh anchor vs ~30s steady state, logs in #393).Change
ds4_kvstore_cold_store_target()next to the existingds4_kvstore_continued_store_target(). Cold starts keep the original behavior (anchor, else aligned boundary of the full prompt).0 < cached < anchor) now stores the anchor during the same prefill. The lookup returning a shorter entry proves no matching anchor exists on disk, so this never rewrites a matching checkpoint; hits at or past the anchor store nothing. Cost: one extra checkpoint write (~70 ms for a 27K-token anchor on our setup) on the first session after the prefix changed.--kv-cache-cold-max-tokens(Cold checkpoint silently skipped when the first prompt exceeds kv-cache-cold-max-tokens (Claude Code + MCP easily > 30K) #392); previously the skip was invisible.Test plan
makeclean build, Apple Silicon / Metal (M4)./ds4_test --serverpasses, including two new unit tests:test_kv_cache_cold_store_target_cold_start— original behavior preserved (anchor, boundary fallback, cold_max/min/enabled gates)test_kv_cache_cold_store_target_refreshes_anchor_after_partial_hit— refresh below anchor; no store at/past anchor, without anchor, or past cold_maxreason=cold tokens=27009mid-prefill and the second fresh session returned to 13.8s prefill (vs ~30s steady state unpatched). Logs in the comment below.Co-authored with Claude (analysis from sse-tee request-body diffs + L3 logs of a production deployment; details in #392/#393).