feat(backend): add projection-based Firestore cache foundation (DD-007)#7958
feat(backend): add projection-based Firestore cache foundation (DD-007)#7958Git-on-my-level wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 88ca73c5e0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -0,0 +1,158 @@ | |||
| import json | |||
There was a problem hiding this comment.
Add the cache test to backend/test.sh
backend/AGENTS.md requires that new test files be added to test.sh; I checked backend/test.sh, and it enumerates unit tests explicitly but does not include this new tests/unit/test_firestore_cache.py file. As a result, the Firestore cache coverage added here will not run in CI unless the runner is updated.
Useful? React with 👍 / 👎.
Greptile SummaryIntroduces a typed, projection-based Redis read-through cache for Firestore under
Confidence Score: 4/5Safe to merge; the cache is off by default and Firestore remains the authoritative source of truth in all failure modes. The core logic is sound: projections are narrow, invalidation hooks cover all write paths, and the fail-open design means a Redis outage cannot degrade correctness. The two items worth fixing before enabling in production are the
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant users.py
participant firestore_cache
participant Redis
participant Firestore
Caller->>users.py: get_user_language_preference(uid)
users.py->>firestore_cache: get_or_fetch(policy, uid, fetch_fn)
alt cache disabled (default)
firestore_cache->>Firestore: fetch_fn() — field projection [language]
Firestore-->>firestore_cache: value
firestore_cache-->>users.py: value
else Redis down
firestore_cache->>Redis: GET key
Redis-->>firestore_cache: error
firestore_cache->>Firestore: fetch_fn() (fail-open)
Firestore-->>firestore_cache: value
firestore_cache-->>users.py: value
else cache miss
firestore_cache->>Redis: GET key
Redis-->>firestore_cache: nil
firestore_cache->>Firestore: fetch_fn()
Firestore-->>firestore_cache: value
firestore_cache->>Redis: "SET key envelope ex=ttl+jitter"
firestore_cache-->>users.py: value
else cache hit
firestore_cache->>Redis: GET key
Redis-->>firestore_cache: envelope
firestore_cache-->>users.py: envelope.payload
end
users.py-->>Caller: language string
Note over Caller,Firestore: On write path
Caller->>users.py: set_user_language_preference(uid, lang)
users.py->>Firestore: "set({language: lang}, merge=True)"
users.py->>firestore_cache: invalidate(USER_LANGUAGE_CACHE, uid)
firestore_cache->>Redis: DEL key
users.py->>firestore_cache: invalidate(USER_TRANSCRIPTION_PREFS_CACHE, uid)
firestore_cache->>Redis: DEL key
Reviews (1): Last reviewed commit: "fix(backend): make Firestore cache keys ..." | Re-trigger Greptile |
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| _GLOBAL_VERSION = os.getenv('FIRESTORE_CACHE_GLOBAL_VERSION', '1') |
There was a problem hiding this comment.
Global version captured at import time
_GLOBAL_VERSION is read once when the module is imported. The architecture docs advertise FIRESTORE_CACHE_GLOBAL_VERSION=2 as a "runtime emergency rollback", but in practice it takes effect only after a full process restart (pod redeploy). By contrast, is_enabled() reads its env vars on every call, so the FIRESTORE_CACHE_ENABLED kill-switch works without a restart. This asymmetry is not documented in the rollback section, and an operator following the playbook during an incident could reasonably believe changing the env var is sufficient without a redeploy.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
|
||
|
|
||
| def _set(policy: CachePolicy, key: str, payload: Any, now: Optional[float] = None) -> None: | ||
| now = now or time.time() |
There was a problem hiding this comment.
now = now or time.time() is a falsy-value check on an Optional[float]. A timestamp of exactly 0.0 (January 1, 1970 UTC) would incorrectly be treated as None and replaced with the current time, corrupting fresh_until. Use an explicit is None guard to match the Optional[float] contract.
| now = now or time.time() | |
| now = now if now is not None else time.time() |
| def test_users_module_only_wires_safe_projection_caches(): | ||
| source = open('database/users.py').read() |
There was a problem hiding this comment.
Both
test_users_module_only_wires_safe_projection_caches and test_ai_profile_update_bypasses_cached_getter_for_merge_safety open database/users.py with a bare relative path. This only works when pytest is invoked from the backend/ directory. Running from the repo root or any other location silently fails with FileNotFoundError, making the guardrail tests unreliable in CI unless the working directory is always pinned. Use pathlib.Path(__file__).parent.parent / "database" / "users.py" to anchor the path to the test file's location.
| def test_users_module_only_wires_safe_projection_caches(): | |
| source = open('database/users.py').read() | |
| def test_users_module_only_wires_safe_projection_caches(): | |
| import pathlib | |
| source = (pathlib.Path(__file__).parent.parent / 'database' / 'users.py').read_text() |
| def test_ai_profile_update_bypasses_cached_getter_for_merge_safety(): | ||
| source = open('database/users.py').read() |
There was a problem hiding this comment.
Same relative-path fragility as the prior test — anchoring to the test file's parent avoids a working-directory dependency.
| def test_ai_profile_update_bypasses_cached_getter_for_merge_safety(): | |
| source = open('database/users.py').read() | |
| def test_ai_profile_update_bypasses_cached_getter_for_merge_safety(): | |
| import pathlib | |
| source = (pathlib.Path(__file__).parent.parent / 'database' / 'users.py').read_text() |
Summary
Adds the DD-007 projection-based Firestore cache foundation for reducing Firestore read amplification without taking the unsafe shortcut of caching the full
users/{uid}document.This is the first PR in the long-term Firestore read-reduction architecture recommended by the DD-007 committee review.
Why this approach
Firestore default database currently costs roughly $3,807/mo net with ~10.7B reads/month. DD-007 found repeated reads of the same user document on WebSocket startup and other hot paths.
The naive fix would be a full user-doc Redis cache. This PR explicitly avoids that because
users/{uid}mixes low-risk preferences with correctness-critical fields:Instead this PR introduces a typed projection cache and only applies it to low-risk projections.
Changes
1. New reusable cache foundation
Files:
backend/database/firestore_cache.pybackend/database/firestore_cache_metrics.pyFeatures:
FIRESTORE_CACHE_ENABLED=falseFIRESTORE_CACHE_USER_LANGUAGE_ENABLED=truefs:v{global}:{namespace}:v{policy}:{entity}version,kind,created_at,fresh_until,payloadnamespace,result) — no UID/PII labels2. Low-risk user projection caches only
File:
backend/database/users.pyCached projections:
get_user_language_preference()user_languageget_user_transcription_preferences()user_transcription_prefsget_ai_user_profile()user_ai_profileNot cached:
get_user_subscription()/get_user_valid_subscription()get_data_protection_level()get_user_profile()3. Invalidation hooks
set_user_language_preference()user_language,user_transcription_prefsset_user_transcription_preferences()user_transcription_prefsset_user_custom_stt_usage()user_transcription_prefsupdate_ai_user_profile()user_ai_profileImportant:
update_ai_user_profile()reads existing profile directly from Firestore (not from the cached getter) to avoid stale cached read-modify-write overwrites.4. Architecture docs
File:
backend/docs/firestore-cache-architecture.mdDocuments why projection cache, not full user-doc cache; policy classes and what not to cache; runtime flags and rollback; cache key/envelope design; metrics and invalidation expectations; rollout plan.
Cost impact
This PR is the foundation + first safe projection rollout, not the entire DD-007 savings package.
/v4/listenrepeated user reads into one typed startup projectionThis PR is intentionally conservative: it prioritizes correctness and long-term architecture over grabbing all savings in one risky full-doc cache.
Validation
Result: 7 passed in 0.06s
Independent subagent review: caught stale cached read-modify-write in
update_ai_user_profile(). Fixed by adding_get_ai_user_profile_from_firestore(). Final review: APPROVED.Rollback
Runtime rollback:
FIRESTORE_CACHE_ENABLED=falseEmergency abandon:
FIRESTORE_CACHE_GLOBAL_VERSION=2No data migration required — Firestore remains source of truth.