Skip to content

fix(backend): set Anthropic prompt cache TTL to 1h (was 5m default)#7953

Open
Git-on-my-level wants to merge 1 commit into
BasedHardware:mainfrom
Git-on-my-level:fix/anthropic-cache-ttl-1h
Open

fix(backend): set Anthropic prompt cache TTL to 1h (was 5m default)#7953
Git-on-my-level wants to merge 1 commit into
BasedHardware:mainfrom
Git-on-my-level:fix/anthropic-cache-ttl-1h

Conversation

@Git-on-my-level

Copy link
Copy Markdown
Collaborator

Summary

Adds ttl: "1h" to the Anthropic cache_control dict in the Python backend's agentic chat agent. This is a single-field addition (3 lines: 2 comment + 1 code) that restores the previous default after Anthropic changed it.

The problem

On March 6, 2026, Anthropic changed the default prompt cache TTL from 1 hour → 5 minutes. The Omi codebase was written when 1h was the default and was never updated:

# BEFORE (gets implicit 5-min TTL — kills interactive chat caching):
"cache_control": {"type": "ephemeral"}

# AFTER (explicit 1-hour TTL):
"cache_control": {"type": "ephemeral", "ttl": "1h"}

Why this matters

The omi-prod-chat Anthropic key has a 9.0% cache hit rate — and it's been exactly 9% every single day for 14 days straight. By comparison, batch workloads using identical code achieve 98% hit rate because their requests arrive within the 5-minute window.

The difference is purely request spacing:

  • Interactive chat: user sends message → reads response → does something else → 5+ minutes pass → cache expires → next message pays full price
  • Batch processing: requests fire in rapid succession → each one refreshes TTL → 98% hits

Cost / production impact

Metric Current Expected After Fix
Cache hit rate 9.0% 40–65%
Net input cost savings/mo $1,273–$2,918

User impact: neutral-to-positive. Same responses, same model, same system prompt. Only change is that repeated requests within 1 hour pay ~10% of input price instead of 100%. Latency improves on cache hits.

Implementation

File: backend/utils/retrieval/agentic.py

-    system_blocks = [{"type": "text", "text": system_prompt, "cache_control": {"type": "ephemeral"}}]
+    # TTL=1h: Anthropic changed default from 1h→5m on 2026-03-06; interactive chat
+    # sessions have gaps >5min between turns, so the 5-min default kills cache hit rate.
+    system_blocks = [{"type": "text", "text": system_prompt, "cache_control": {"type": "ephemeral", "ttl": "1h"}}]

This is the only place in the entire Python backend that sets cache_control for Anthropic calls.

Tests added

Two regression tests in tests/unit/test_prompt_cache_integration.py:

  1. test_anthropic_cache_control_has_ttl — source-inspection assert that ttl="1h" appears in _run_anthropic_agent_stream
  2. test_anthropic_cache_control_not_5min_default — guards against accidental revert to bare {"type": "ephemeral"} form

Rollback

Safe single-line revert: remove "ttl": "1h",. Falls back to current 5-min default behavior.

Anthropic changed default cache TTL from 1h→5m on 2026-03-06.
Interactive chat sessions have gaps >5min between turns, so the
5-min default kills cache hit rate (currently 9% for omi-prod-chat).

This restores the previous default: a single 'ttl': '1h' field in
the cache_control dict at agentic.py:367.

Estimated savings: $1,273–$2,918/mo (omi-prod-chat alone).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 421003620e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

agentic_mod = _get_agentic_module()

# Inspect the source to find the system_blocks construction
import inspect

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Move inspect imports to module scope

Please move this inspect import (and the duplicate one added in the next test) to the module imports. backend/AGENTS.md applies to this file and explicitly requires “No in-function imports — all imports at module top level,” so these new tests currently violate the backend import policy.

Useful? React with 👍 / 👎.

@greptile-apps

greptile-apps Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds "ttl": "1h" to the cache_control dict used when building Anthropic system blocks in the agentic chat handler, restoring the 1-hour prompt cache lifetime that was implicitly in effect before Anthropic changed its server-side default from 1h to 5 minutes in early March 2026. Two source-inspection regression tests are included to guard against accidental reversion.

  • backend/utils/retrieval/agentic.py — one-line change; "ttl": "1h" is confirmed valid syntax per the Anthropic API docs and is the only cache_control site in the Python backend.
  • backend/tests/unit/test_prompt_cache_integration.py — two new tests use inspect.getsource to assert the TTL field is present; the positive assertion is reliable, though the negative-path guard has a multi-line fragility and both functions duplicate an in-function import inspect that belongs at module level.

Confidence Score: 4/5

Safe to merge — the production change is a single dict field addition with confirmed valid syntax; the only concerns are in the new test file.

The core fix in agentic.py is minimal and correct. The new tests in test_prompt_cache_integration.py have a style issue (in-function imports) and one test whose negative-path logic would silently pass if the dict were ever reformatted to multi-line, making it a weaker guard than it appears. None of these concerns affect the runtime behavior of the fix itself.

backend/tests/unit/test_prompt_cache_integration.py — in-function import inspect and the fragile per-line negative assertion in test_anthropic_cache_control_not_5min_default.

Important Files Changed

Filename Overview
backend/utils/retrieval/agentic.py Single-field addition of "ttl": "1h" to the Anthropic cache_control dict; syntax confirmed valid by official docs, directly addresses the Anthropic TTL default change from March 2026.
backend/tests/unit/test_prompt_cache_integration.py Two regression tests added using inspect.getsource to guard the ttl field; the positive assertion is solid, but the negative-path test has a fragility with multi-line dicts, and import inspect appears inside both functions rather than at module level.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User sends chat message] --> B[_run_anthropic_agent_stream]
    B --> C{Is system prompt\nin Anthropic cache?}
    C -- Cache HIT\n~0.1x input cost --> D[Fast response\nNo re-processing]
    C -- Cache MISS\nWrite to cache --> E{Which TTL?}
    E -- Before fix\n5-min default\n~1.25x write cost --> F[Cache entry\nexpires in 5 min]
    E -- After fix\n1h explicit\n~2x write cost --> G[Cache entry\nexpires in 1 hour]
    F --> H{Next message\narrives within 5 min?}
    H -- Yes --> I[Cache HIT]
    H -- No\ncommon for interactive chat --> J[Cache MISS\npays full write cost again]
    G --> K{Next message\narrives within 1h?}
    K -- Yes\ncommon for interactive chat --> L[Cache HIT\nsaves ~90%]
    K -- No --> M[Cache MISS\nnext write at 2x cost]
Loading

Reviews (1): Last reviewed commit: "fix(backend): set Anthropic prompt cache..." | Re-trigger Greptile

Comment on lines +843 to +844
"cache_control must include ttl='1h' to avoid 5-min default "
f"(source excerpt: ...{src[src.find('cache_control'):src.find('cache_control')+120]}...)"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Duplicate in-function import — move import inspect to module level

import inspect appears inside both test_anthropic_cache_control_has_ttl and test_anthropic_cache_control_not_5min_default. In-function imports are against the project's backend import rules, and this one is duplicated across two functions anyway. Moving it to the top of the module removes both violations at once.

Context Used: Backend Python import rules - no in-function impor... (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +857 to +865
src = inspect.getsource(agentic_mod._run_anthropic_agent_stream)
# The old (broken) pattern was just {"type": "ephemeral"} with no ttl field
# Find the cache_control line(s)
lines_with_cache_ctrl = [l for l in src.splitlines() if "cache_control" in l]
for line in lines_with_cache_ctrl:
# Must NOT be the bare {"type": "ephemeral"} form
if '"type": "ephemeral"' in line or "'type': 'ephemeral'" in line:
assert "ttl" in line, f"cache_control line missing ttl field: {line.strip()}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Per-line check silently misses multi-line cache_control dicts

lines_with_cache_ctrl collects only lines that contain the string "cache_control". The subsequent guard checks those same lines for "type": "ephemeral". If the dict is ever reformatted to span multiple lines (e.g., "cache_control": {\n "type": "ephemeral"\n}), the line with "cache_control" won't contain "type": "ephemeral", so the assert "ttl" in line branch is never reached and the test passes silently — even when ttl is absent. The first test (test_anthropic_cache_control_has_ttl) already asserts "ttl": "1h" positively and is the more reliable guard; this second test adds limited extra safety while its negative-path logic has this gap.

system_blocks = [{"type": "text", "text": system_prompt, "cache_control": {"type": "ephemeral"}}]
# TTL=1h: Anthropic changed default from 1h→5m on 2026-03-06; interactive chat
# sessions have gaps >5min between turns, so the 5-min default kills cache hit rate.
system_blocks = [{"type": "text", "text": system_prompt, "cache_control": {"type": "ephemeral", "ttl": "1h"}}]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 1-hour TTL cache writes are billed at 2× the base input-token price

The Anthropic API docs confirm "ttl": "1h" is valid syntax and that cache writes with the 1h TTL are charged at 2× the standard input-token price (vs ~1.25× for the 5-minute default). The PR's cost projections ($1,273–$2,918/mo savings) are compelling when the expected hit rate jumps from 9% to 40–65%, and the math does pencil out (≥2 messages per user per hour breaks even), but the increased write price for every cold request was not mentioned in the description. Worth verifying the savings estimates account for this pricing tier when the monitoring data comes in post-deploy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants