feat(backend): sentence-level translation dedup + merge Redis lookup (DD-008)#7954
Conversation
…(DD-008) Phase 1 (sentence-level dedup): Rewrote translate_units_batch() to split each text into sentences before cache lookup. Identical sentences across different segments/users now share cache hits instead of each triggering a separate API call. Preserves full-sentence NMT quality while eliminating 25-35% of redundant character throughput. Phase 3 (merge-aware Redis lookup): On segment merge (prefix reset), check Redis for the merged text before re-translating. Eliminates 10-15% of redundant translations from STT segment combination. Documentation: Added DESIGN DECISION comment at the batch_buffer call site, updated TranslationCoordinator class docstring with architecture/cost notes, and created ADR-001 document at backend/docs/translation-architecture.md. Estimated savings: $1,500–2,140/mo (25-50% of $4,282/mo Google Translate bill). Files changed: - backend/utils/translation.py: translate_units_batch() sentence-level rewrite - backend/utils/translation_coordinator.py: comments + merge Redis lookup - backend/docs/translation-architecture.md: NEW — ADR-001
- Call on_translation_ready when prefix-reset adopts a Redis-cached translation so Firestore persist and WebSocket push run for cached segments. - Cache assembled full-text translations in translate_units_batch so prefix-reset lookups hit Redis for multi-sentence segments.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f15573d9e6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| text_hash = hashlib.md5(original_text.encode()).hexdigest() | ||
| self._set_memory_cache(text_hash, dest_language, assembled, dominant_lang) | ||
| cache_translation(text_hash, dest_language, assembled, dominant_lang) |
There was a problem hiding this comment.
Avoid caching failed translation fallbacks
When a Google Translate chunk raises above, the exception path records each sentence as its original text with an empty detected language, but this unconditional Redis write then stores that fallback as a successful full-text translation. For single-sentence inputs, or any chunk where all translated parts fell back, a transient API failure will poison translate:v1:* for the 14-day TTL and future requests will hit Redis instead of retrying, leaving the text untranslated.
Useful? React with 👍 / 👎.
| await self.on_translation_ready(segment.id, translated_text, detected_lang, conversation_id) | ||
| continue # Don't add to batch buffer |
There was a problem hiding this comment.
Invalidate pending batches after cached prefix resets
When this cached prefix-reset branch runs while the same segment already has an older translation entry buffered, the old entry keeps matching state.version because this path commits the Redis result and continues without bumping the version or removing buffered entries. In that 250ms batching window, _flush_batch() can later treat the stale buffered text as valid and overwrite/persist the older translation after the cached merged translation was already sent.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR introduces two cost-reduction optimisations to the translation pipeline: sentence-level dedup in
Confidence Score: 3/5Fix 2 can cause an in-flight batch result to overwrite the correct merge-cached translation in the UI; several other gaps flagged in prior review rounds are still unaddressed. The merge-aware Redis path fires on_translation_ready immediately but leaves state.version unchanged. If the GCP call for the previous version of the same segment is still in flight, _flush_batch will match the stale version and fire the callback a second time, replacing the correct translation with the old one in both Firestore and the WebSocket stream. Combined with the three open gaps carried over from the previous review, the coordinator path has multiple unresolved correctness issues. backend/utils/translation_coordinator.py — specifically the Fix 2 block (lines 202–214) where the version counter is not incremented before firing the callback. Important Files Changed
Sequence DiagramsequenceDiagram
participant O as observe()
participant BB as _batch_buffer
participant FB as _flush_batch()
participant GCP as Google Translate API
participant CB as on_translation_ready()
Note over O: observe() call A segment X version V
O->>BB: append(seg_X, text_old, conv, V)
Note over FB: _flush_batch starts timer fires
FB->>GCP: await run_blocking translate_units_batch
Note over O: observe() call B segment X prefix reset Fix 2 Redis hit
O->>CB: await on_translation_ready(seg_X, T_new)
Note over O: state.version still V not bumped
O-->>BB: continue skips _next_version
GCP-->>FB: T_old returned
FB->>FB: "state.version == V matches stale version"
FB->>CB: await on_translation_ready(seg_X, T_old) overwrites correct result
Reviews (1): Last reviewed commit: "fix: address review feedback — cache poi..." | Re-trigger Greptile |
| # Check if the new merged text was already translated (Redis cache) | ||
| text_hash = hashlib.md5(text.encode()).hexdigest() | ||
| redis_cached = get_cached_translation(text_hash, self.target_language) | ||
| if redis_cached: | ||
| # Found in Redis — adopt as committed, skip re-translation | ||
| translated_text = redis_cached['text'] | ||
| detected_lang = redis_cached.get('detected_lang', '') | ||
| state.committed_text = text | ||
| state.assembled_translation = translated_text | ||
| state.detected_lang = detected_lang | ||
| self.metrics['prefix_resets'] += 1 | ||
| await self.on_translation_ready(segment.id, translated_text, detected_lang, conversation_id) | ||
| continue # Don't add to batch buffer |
There was a problem hiding this comment.
Missing
should_persist_translation guard in merge-aware Redis path
When Fix 2 finds a Redis hit during a prefix reset, it calls on_translation_ready directly without checking should_persist_translation. The normal batch path (line 358) applies this check before firing the callback; skipping it here means a no-op "translation" (e.g. English text being "translated" to English) stored in the Redis positive cache can trigger on_translation_ready and create a spurious translation badge in the UI — exactly the case should_persist_translation was built to prevent.
The scenario is concrete: the batch path calls cache_translation in Phase 3 of translate_units_batch before the should_persist_translation check runs in _flush_batch. If that check then returns False it sets the negative cache but leaves the positive cache entry intact. A later merge on the same text hits the positive cache in Fix 2 and fires the callback unconditionally.
| # Fallback: should not happen, but use original text | ||
| translated_parts.append(sent_text) | ||
|
|
||
| assembled = ' '.join(translated_parts) |
There was a problem hiding this comment.
ASCII space joining breaks CJK target-language output
' '.join(translated_parts) inserts a Latin space between sentences even when the target language is Chinese, Japanese, or Korean. Those languages don't delimit sentences with spaces, so the assembled string will contain spurious U+0020 characters (e.g. 你好。 你好吗? instead of 你好。你好吗?). This affects every CJK target-language user and degrades the quality of the persisted Firestore translation.
| assembled = ' '.join(translated_parts) | |
| _CJK_LANGS = {'zh', 'ja', 'ko', 'zh-cn', 'zh-tw', 'zh-hk'} | |
| sep = '' if dest_language.lower().split('-')[0] in _CJK_LANGS else ' ' | |
| assembled = sep.join(translated_parts) |
| # Check if the new merged text was already translated (Redis cache) | ||
| text_hash = hashlib.md5(text.encode()).hexdigest() | ||
| redis_cached = get_cached_translation(text_hash, self.target_language) | ||
| if redis_cached: | ||
| # Found in Redis — adopt as committed, skip re-translation | ||
| translated_text = redis_cached['text'] | ||
| detected_lang = redis_cached.get('detected_lang', '') | ||
| state.committed_text = text | ||
| state.assembled_translation = translated_text | ||
| state.detected_lang = detected_lang | ||
| self.metrics['prefix_resets'] += 1 | ||
| await self.on_translation_ready(segment.id, translated_text, detected_lang, conversation_id) |
There was a problem hiding this comment.
language_state not updated on Redis cache hit path
The normal batch path updates self.language_state.observe(original_text, speaker_id=None) with the API-detected language (line 373) so the monolingual gate can adapt. When Fix 2 short-circuits via the Redis cache, this observe() call is skipped entirely. Over time the language model can drift if the Redis path is the primary path for high-frequency segments, weakening the monolingual gate's ability to detect language switches.
| # Phase 3: Reassemble per-unit results from sentence translations | ||
| results = [] | ||
| for unit_idx, (unit_id, original_text, sentences) in enumerate(unit_sentences): | ||
| if not sentences: | ||
| results.append((unit_id, original_text, '')) | ||
| continue | ||
|
|
||
| translated_parts = [] | ||
| detected_langs = [] | ||
| for sent_text, sent_hash in sentences: | ||
| if sent_hash in sent_translation: | ||
| trans_text, det_lang = sent_translation[sent_hash] | ||
| translated_parts.append(trans_text) | ||
| if det_lang: | ||
| detected_langs.append(det_lang) | ||
| else: | ||
| # Fallback: should not happen, but use original text | ||
| translated_parts.append(sent_text) | ||
|
|
||
| assembled = ' '.join(translated_parts) | ||
| # Dominant detected language from constituent sentences | ||
| dominant_lang = '' | ||
| if detected_langs: | ||
| lang_counts = Counter(detected_langs) | ||
| dominant_lang = lang_counts.most_common(1)[0][0] | ||
|
|
||
| text_hash = hashlib.md5(original_text.encode()).hexdigest() | ||
| self._set_memory_cache(text_hash, dest_language, assembled, dominant_lang) | ||
| cache_translation(text_hash, dest_language, assembled, dominant_lang) | ||
|
|
||
| results.append((unit_id, assembled, dominant_lang)) |
There was a problem hiding this comment.
Phase 3 unconditionally writes full-text positive cache even for same-language text
cache_translation(text_hash, ...) is always called in Phase 3 regardless of whether the assembled translation actually differs from the original. If all constituent sentences were detected as same-language (returning original text), the full-text cache entry will hold the original text — an entry that downstream callers (including the merge-aware Fix 2 lookup) will treat as a valid translation. The _flush_batch check in the coordinator catches this for the normal flow, but the Fix 2 shortcut path bypasses it, so a "translation" equalling the source text can reach on_translation_ready.
…tion
P1: Skip Redis cache write when any sentence fell back to original text
due to API failure. Previously the exception path stored untranslated
fallback text into Redis, poisoning the 14-day TTL cache so future
requests would return raw text instead of retrying.
P2: Filter pending batch buffer entries for a segment when adopting a
Redis-cached translation on prefix reset. Without this, a stale
buffered entry could later overwrite the cached result in _flush_batch().
Git-on-my-level
left a comment
There was a problem hiding this comment.
Review feedback addressed in e13a77b
P1 — Cache poisoning on API failure:
- Added
_failed_sent_hashesset tracking sentences that fell back to original text - Phase 3 now skips Redis
cache_translation()when any constituent sentence is a fallback - Transient API failures no longer poison the 14-day TTL cache with untranslated text
P2 — Stale batch buffer overwrite after prefix reset:
- When adopting a Redis-cached translation on prefix reset, the code now filters out pending batch buffer entries for that segment ID
- Prevents
_flush_batch()from later overwriting with stale buffered text
| text_hash = hashlib.md5(text.encode()).hexdigest() | ||
| redis_cached = get_cached_translation(text_hash, self.target_language) | ||
| if redis_cached: | ||
| # Found in Redis — adopt as committed, skip re-translation | ||
| translated_text = redis_cached['text'] | ||
| detected_lang = redis_cached.get('detected_lang', '') | ||
| state.committed_text = text | ||
| state.assembled_translation = translated_text | ||
| state.detected_lang = detected_lang | ||
| self.metrics['prefix_resets'] += 1 | ||
| # Invalidate any pending batch entries for this segment so | ||
| # _flush_batch() does not overwrite with stale buffered text. |
There was a problem hiding this comment.
Stale in-flight batch job fires after Fix 2 callback
Fix 2 fires on_translation_ready and then continues, bypassing the _next_version() call on line 272. If a batch job for this segment was already queued (e.g., submitted 200 ms earlier and still awaiting the GCP API response), _flush_batch will see state.version == version still matching, and fire on_translation_ready a second time with the old (pre-merge) translation — overwriting the correct Fix 2 result in the UI.
The old code avoids this because any prefix reset that reaches _batch_buffer.append always passes through _next_version(), invalidating the prior in-flight version. Fix 2 needs to increment the version before firing the callback to close this race window. Adding state.version = self._next_version() before the await self.on_translation_ready(...) call is sufficient.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e13a77b323
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| self._batch_buffer = [ | ||
| entry for entry in self._batch_buffer if entry[0] != segment.id | ||
| ] |
There was a problem hiding this comment.
Bump the version when adopting cached merged text
Fresh evidence: the current patch only removes entries that are still in _batch_buffer, but _flush_batch() copies and clears that buffer before awaiting translate_units_batch(). If this Redis-hit path runs while that API call is already in flight, state.version stays at the old value, so the stale batch result still passes the version check and can overwrite/persist an older translation after this cached merged translation was sent.
Useful? React with 👍 / 👎.
| self._batch_buffer = [ | ||
| entry for entry in self._batch_buffer if entry[0] != segment.id | ||
| ] | ||
| await self.on_translation_ready(segment.id, translated_text, detected_lang, conversation_id) |
There was a problem hiding this comment.
Reapply the no-op filter before cached callbacks
When the cached entry is an unchanged translation, this shortcut bypasses the should_persist_translation() check used in _flush_batch() and immediately persists/sends a translation identical to the source. That cache state is possible because translate_units_batch() writes Redis before _flush_batch() decides a result is a no-op, so prefix resets for already-target-language text can reintroduce translation badges that the coordinator normally suppresses.
Useful? React with 👍 / 👎.
| self.metrics['prefix_resets'] += 1 | ||
| # Check if the new merged text was already translated (Redis cache) | ||
| text_hash = hashlib.md5(text.encode()).hexdigest() | ||
| redis_cached = get_cached_translation(text_hash, self.target_language) |
There was a problem hiding this comment.
Offload the Redis lookup from observe()
On a prefix reset this calls the synchronous redis.Redis.get path from inside async observe(). The backend rule in AGENTS.md says Redis CRUD in async def must be offloaded with run_blocking(..., db_executor, ...); otherwise a slow Redis read stalls the WebSocket event loop for other sessions on the same worker.
Useful? React with 👍 / 👎.
Summary
Reduces Google Translate V3 spend by 25–50% ($1,500–$2,140/mo) through two targeted fixes to the translation pipeline, plus architecture documentation to prevent future cost regressions.
The Problem
The
TranslationCoordinatorsends full segment text to Google Translate on every STT update cycle. As speech evolves ("Hola"→"Hola como"→"Hola como estas"), each version generates a unique MD5 cache key — so even though 60–70% of content is redundant, nothing hits cache.Result: $4,282/mo for 284 million characters translated, with an effective cache hit rate far below what the 3-tier cache system (LRU → Redis → negative) should deliver.
Fix 1: Sentence-Level Dedup in Batch Path (Primary)
File:
backend/utils/translation.py—translate_units_batch()Fix 2: Merge-Aware Redis Lookup (Secondary)
File:
backend/utils/translation_coordinator.pyDocumentation
coordinator.py:272-291coordinator.py:97-122backend/docs/translation-architecture.mdCost / Production Impact
Risk Mitigations
split_into_sentences()' '.join()'preserves original sequenceValidation
Rollback
Safe revert of 2 code files + 1 new doc file. No data migration needed — caches are ephemeral.