feat(memory): recall and store memory around chat completions by dpoulopoulos · Pull Request #190 · mozilla-ai/otari

dpoulopoulos · 2026-06-22T07:08:07Z

Description

Wires persistent memory into chat completions in platform mode: the gateway recalls relevant remembered context before the model answers and stores durable new facts afterward. It is off by default (OTARI_MEMORY_ENABLED), best-effort, and covers both streaming and non-streaming completions.

The point is that assistants built on the gateway feel continuous across sessions instead of stateless. The platform owns configuration, storage, privacy, and accounting (companion PR mozilla-ai/otari-ai#1148); the gateway just brackets each completion with recall + remember.

What's in it:

OTARI_MEMORY_ENABLED master switch (default off → no behavior change for existing deployments).
A best-effort platform memory client + prompt-injection helpers that swallow errors/timeouts, so memory never breaks or noticeably slows a chat.
Recall before dispatch (covers every path) and a fire-and-forget remember afterward, for both non-streaming and streaming completions (the streaming path accumulates the assistant text and stores only on normal completion).

No memory behavior or added latency in standalone mode.

PR Type

Relevant issues

Closes #189

Checklist

I understand the code I am submitting.
I have added or updated tests that cover my change (tests/unit, tests/integration).
I ran the Definition of Done checks locally (make lint, make typecheck, make test).
Documentation was updated where necessary.
If the API contract changed, I regenerated the OpenAPI spec (uv run python scripts/generate_openapi.py). (No contract change: memory is internal, the spec is unchanged.)

AI Usage

No AI was used.
AI was used for drafting/refactoring.
This is fully AI-generated.

AI Model/Tool used: Claude Code (Claude Opus 4.8)

Any additional AI details you'd like to share:
The memory integration was implemented end-to-end with Claude Code; the human author directed the work and reviews the result.

I am an AI Agent filling out this form (check box if true)

Overview

This pull request adds persistent memory functionality to the gateway's chat completion endpoints when operating in platform mode. The feature enables assistants to recall relevant facts from prior conversations and store new information for future sessions, improving the user experience by eliminating the need to repeat preferences and context across separate conversations.

What Changed

New Capability: Persistent Conversation Memory

Assistants can now recall relevant facts from previous conversations before answering user queries
The gateway automatically captures and stores new facts after each conversation
Stored information persists across session breaks, enabling contextual awareness across multiple conversations

Configuration & Control

Added master switch OTARI_MEMORY_ENABLED to control the feature globally (defaults to off)
Added configurable timeout thresholds: PLATFORM_MEMORY_RECALL_TIMEOUT_MS (2000ms default) and PLATFORM_MEMORY_REMEMBER_TIMEOUT_MS (10000ms default)
Memory is exclusively a platform-mode feature; standalone deployments are completely unaffected

Implementation Details

Recall operations execute before sending requests to the underlying model, adding up to the recall timeout to time-to-first-token only when memory service experiences degradation
Storage operations run asynchronously as fire-and-forget tasks for non-streaming completions
Streaming completions accumulate assistant text during transmission and store it upon normal completion
All memory operations are best-effort: timeouts, errors, and service unavailability are gracefully handled with fallback to stateless behavior

Code Changes

Memory injection helpers now integrate recalled facts into system prompts
Chat completion endpoints enhanced to recall facts before dispatch and store facts after completion
Streaming pipeline updated to capture assistant text and trigger memory storage upon completion
Configuration system extended with memory-related settings and timeout parsing

Benefits

Improved User Experience: Assistants retain context about user preferences and information across conversations without requiring repetition
Zero Impact on Existing Deployments: Feature is opt-in; existing systems see no changes, no latency impact, and no configuration overhead unless explicitly enabled
Robust Degradation: Memory failures never break chat completions or cause noticeable latency; the gateway operates identically to today's behavior when memory is unavailable
Comprehensive Coverage: Works with both streaming and non-streaming completion paths

Testing

Comprehensive unit and integration tests verify:

Memory helpers correctly inject and extract facts
Best-effort error handling swallows all failure modes without disrupting completions
Configuration timeouts are parsed robustly with sensible fallbacks
Integration tests confirm memory off produces zero memory API calls, while memory on correctly recalls, injects, and stores information

coderabbitai · 2026-06-22T07:08:19Z

Walkthrough

Adds opt-in persistent memory to the gateway's platform-mode chat completions path. A new memory_enabled config flag gates the feature. Message helpers inject recalled facts into prompts and format turns for storage. Two platform API clients handle best-effort recall and remember calls with configurable timeouts. A streaming wrapper captures assistant text across chunks and fires a background persist task on normal completion. Both streaming and non-streaming paths in chat_completions are wired to recall before dispatch and remember after, with comprehensive unit and integration test coverage.

Changes

Platform Memory Integration

Layer / File(s)	Summary
`memory_enabled` configuration and environment overrides `src/gateway/core/config.py`, `tests/integration/test_config_env_loading.py`	Adds `GatewayConfig.memory_enabled` boolean field (`default=False`) as the gateway-level master switch for recall/remember behavior in platform mode. Platform environment variables `PLATFORM_MEMORY_RECALL_TIMEOUT_MS` and `PLATFORM_MEMORY_REMEMBER_TIMEOUT_MS` are parsed as integers and stored for timeout configuration. Integration tests verify both config fields and environment variable loading.
Message manipulation helpers `src/gateway/api/routes/_helpers.py`, `tests/unit/test_memory_inject.py`	Adds `MEMORY_FACTS_HEADER` constant, `inject_memory_facts` (prepends/appends a memory facts block to the message list without mutating the input), and `build_remember_messages` (constructs a minimal user+assistant message pair for storage). Unit tests cover no-op behavior with empty facts, system message creation vs. append, mutation safety, and all combinations of user/assistant text presence.
Platform recall/remember API clients `src/gateway/api/routes/_platform.py`, `tests/unit/test_memory_platform.py`	Adds `_coerce_timeout_ms` helper, `_recall_platform_memory` (POST to `/gateway/memory/recall`, returns filtered `list[str]` or `[]` on any error condition), and `_remember_platform_memory` (fire-and-forget POST to `/gateway/memory/remember`, silently swallows all errors). Unit tests cover successful recalls/remembers, early short-circuit conditions (missing base URL, empty inputs), non-200 responses, timeouts, malformed JSON, and all `httpx` error modes.
Streaming memory-capture wrapper `src/gateway/api/routes/_pipeline.py`, `tests/unit/test_stream_memory_capture.py`	Extends `run_streaming_with_fallback` with `extract_stream_text` and `on_memory_settled` keyword callbacks. Adds `_swallow_memory_task_exception` helper and `_stream_with_memory_capture`, which accumulates extracted text across chunks, yields original chunks unchanged, and schedules the settled callback as a detached background task only on normal stream completion—not on mid-stream `aclose()` disconnection. Unit tests verify pass-through behavior, normal-completion callback invocation, early disconnection (no callback), all-`None` chunks (no callback), exception swallowing, and chunk text extraction rules.
Chat route memory integration `src/gateway/api/routes/chat.py`, `tests/integration/test_platform_mode_chat.py`	Adds three internal helpers (`_schedule_memory_remember`, `_chat_chunk_text`, `_streaming_memory_remember`). Wires recall+inject before dispatch in platform mode (when `config.memory_enabled` and `ctx.platform_mode` are both true), passes capture callbacks into the streaming fallback for streamed completions, and schedules a background persist task after non-streaming completions. All memory operations are best-effort; failures are logged and do not affect completion flow. Integration tests verify memory is disabled by default (no platform memory calls) and that memory-enabled mode correctly recalls, injects, and remembers across streaming and non-streaming paths.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

mozilla-ai/otari#130: Both PRs modify src/gateway/api/routes/_pipeline.py's streaming/fallback infrastructure—specifically run_streaming_with_fallback(...) signature and behavior—so they share the same code seam and may need sequencing or conflict resolution.

Suggested reviewers

tbille
agpituk

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 23.68% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title uses a valid Conventional Commit prefix (feat), follows imperative mood, is under 70 characters (61 chars), and clearly summarizes the main change: adding memory recall and storage to chat completions.
Description check	✅ Passed	The PR description is comprehensive and well-structured, covering all template sections including clear description, PR type selection, linked issue reference, completed checklist items, and AI usage disclosure.
Linked Issues check	✅ Passed	All coding requirements from issue `#189` are met: memory is opt-in with OTARI_MEMORY_ENABLED (off by default), best-effort with error handling throughout, covers both streaming and non-streaming paths, platform-mode only, and includes comprehensive test coverage.
Out of Scope Changes check	✅ Passed	All changes directly support the memory feature scope: config additions for memory flags/timeouts, helpers for prompt injection and platform memory client calls, chat endpoint wiring, streaming wrapper, and comprehensive unit/integration tests—no unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/memory

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch feat/memory

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

tests/unit/test_memory_platform.py (1)

25-80: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add regression tests for malformed timeout config values.

Please add tests where memory_recall_timeout_ms / memory_remember_timeout_ms are non-numeric, so best-effort semantics stay protected against config drift.

Also applies to: 86-120

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_memory_platform.py` around lines 25 - 80, Add new regression
test cases to verify that non-numeric values for `memory_recall_timeout_ms` and
`memory_remember_timeout_ms` configuration parameters are handled gracefully.
Create test functions that pass malformed timeout values (such as strings or
None) through the _config() helper function and verify that the
_recall_platform_memory function still executes successfully without raising
exceptions, following the same pattern as the existing test functions like
test_recall_non_200_yields_no_facts and test_recall_timeout_is_swallowed to
ensure best-effort semantics are protected against configuration drift.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/gateway/api/routes/_pipeline.py`:
- Around line 1461-1462: The asyncio.create_task call for on_settled_text is
fire-and-forget and lacks exception handling, which causes unhandled task
exceptions to leak runtime warnings. Add a done callback to the task created by
asyncio.create_task that suppresses any exceptions raised by on_settled_text,
ensuring memory persistence failures remain silent and non-disruptive. Reference
the asyncio.create_task line with on_settled_text and add a callback using
add_done_callback that catches and silently ignores any exceptions from the
completed task.

In `@src/gateway/api/routes/_platform.py`:
- Around line 602-603: The timeout parsing on line 602 and line 639 using
int(config.platform.get("memory_recall_timeout_ms", 2000)) can raise ValueError
exceptions if the config value is not a valid integer, breaking the "never block
chat" contract. For both occurrences where memory_recall_timeout_ms is parsed,
wrap the int() conversion in a try-except block to safely handle invalid input,
catching ValueError and returning the default fallback value of 2000 when
parsing fails. This ensures misconfiguration does not cause chat functionality
to break.

In `@src/gateway/api/routes/chat.py`:
- Around line 361-364: The recall operation in the memory check block lacks
error handling, which means any exception raised by _recall_platform_memory
(such as invalid timeout parsing in its config handling) will cause the entire
request to fail. Wrap the _recall_platform_memory call and the subsequent
inject_memory_facts call in a try-except block so that any exceptions are caught
and logged, allowing the request to proceed without memory injection rather than
breaking the completion entirely. This ensures memory recall failures remain
best-effort and do not disrupt provider dispatch.

---

Nitpick comments:
In `@tests/unit/test_memory_platform.py`:
- Around line 25-80: Add new regression test cases to verify that non-numeric
values for `memory_recall_timeout_ms` and `memory_remember_timeout_ms`
configuration parameters are handled gracefully. Create test functions that pass
malformed timeout values (such as strings or None) through the _config() helper
function and verify that the _recall_platform_memory function still executes
successfully without raising exceptions, following the same pattern as the
existing test functions like test_recall_non_200_yields_no_facts and
test_recall_timeout_is_swallowed to ensure best-effort semantics are protected
against configuration drift.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eee16f2c-1ddb-4b1b-a076-e50f57f07112

📥 Commits

Reviewing files that changed from the base of the PR and between a382c06 and 0aee46f.

📒 Files selected for processing (8)

src/gateway/api/routes/_helpers.py
src/gateway/api/routes/_pipeline.py
src/gateway/api/routes/_platform.py
src/gateway/api/routes/chat.py
src/gateway/core/config.py
tests/unit/test_memory_inject.py
tests/unit/test_memory_platform.py
tests/unit/test_stream_memory_capture.py

dpoulopoulos

Adds opt-in persistent memory to platform-mode chat completions: a new memory_enabled flag (off by default) gates a best-effort recall-before-dispatch + remember-after flow. Message helpers inject recalled facts into the system prompt and build the turn to store; two platform clients (_recall_platform_memory, _remember_platform_memory) talk to /gateway/memory/{recall,remember}; a streaming wrapper accumulates assistant text and fires a fire-and-forget remember on normal completion. Both streaming and non-streaming paths are wired. Standalone mode is unaffected.

dpoulopoulos · 2026-06-22T07:40:40Z

Thanks for the careful review. Pushed fixups addressing the best-effort gaps:

Timeout parsing no longer uses a bare int(...). Added a _coerce_timeout_ms helper that falls back to the default for missing, non-numeric, or non-positive values, used by both recall and remember.
Broadened the recall/remember error handling from (TimeoutException, NetworkError) to httpx.HTTPError, so the long tail (RemoteProtocolError, ProxyError, InvalidURL, etc.) can't turn a would-succeed chat into a 500.
Wrapped the recall call site in chat.py in a try/except that logs and proceeds without facts, as a belt-and-suspenders boundary.
The streaming remember task now keeps its reference and attaches a done-callback that retrieves and logs any exception, so a failed background write doesn't leak a "Task exception was never retrieved" warning.
Added regression tests: malformed timeout values (recall + remember) asserting the default fallback, a RemoteProtocolError swallowed on both paths, and a raising streamed settled-callback that doesn't propagate.

Two points I left as-is, with reasoning:

Recall-before-dispatch adds up to the recall timeout to TTFT under a degraded memory service. That's inherent (facts must be in the prompt before the call) and already bounded by a configurable timeout, so no change beyond the safer parsing above.
The multimodal-system-content repr edge case in inject_memory_facts mirrors the existing inject_purpose_hints helper; system content is effectively always a string, and changing only this helper would diverge from that pattern.

ruff, mypy (strict), and the full unit suite are green.

dpoulopoulos

Adds opt-in, best-effort persistent memory to platform-mode chat completions, gated by a new memory_enabled config flag (OTARI_MEMORY_ENABLED, default off). Before dispatch the gateway recalls facts from the platform (/gateway/memory/recall) and injects them into the system prompt; after a completion it stores the new turn fire-and-forget (/gateway/memory/remember) for both non-streaming (background task) and streaming (a capture wrapper that accumulates assistant text and persists only on normal completion). Standalone mode and the /messages and /responses endpoints are unaffected. This is a re-review after the author addressed the previous round of feedback.

Prior feedback status: resolved. All actionable items from the previous review (CodeRabbit's 3 plus the prior [major]/[minor] review's 5) are addressed in the current code: the narrow (TimeoutException, NetworkError) catch was broadened to httpx.HTTPError in both recall and remember; a _coerce_timeout_ms helper guards timeout parsing; the recall call site in chat.py is wrapped in a best-effort try/except; the streaming create_task now has an exception-swallowing done-callback; and regression tests for malformed timeouts, generic httpx errors, and a raising settled-callback were added. Two low-priority items the prior review explicitly flagged as by-design/low-risk remain open (recall latency, multimodal-system-content edge case). Verified: ruff, mypy (strict), and all 32 new unit tests pass. No web UI exists for this backend feature, so Playwright verification is not applicable.

dpoulopoulos · 2026-06-22T08:06:08Z

+    memory_user_token = ctx.user_token if (config.memory_enabled and ctx.platform_mode) else None
+    if memory_user_token:
+        try:
+            recalled = await _recall_platform_memory(config, memory_user_token, latest_user_text(request.messages))


[minor] Recall is awaited synchronously before dispatch, so under a slow/degraded memory service it adds up to memory_recall_timeout_ms (default 2000ms) to time-to-first-token on every memory-enabled platform request. This is inherent to recall-before-dispatch (the facts must be in the prompt before the call) and is a reasonable trade-off, but the operator-facing latency budget isn't documented anywhere: memory_recall_timeout_ms / memory_remember_timeout_ms are read from the config.platform dict but aren't surfaced in the memory_enabled field description, config.example.yml, or the docs. Worth documenting the latency budget (and perhaps a more aggressive recall default than 2s) so operators understand the cost of enabling it. Carried over from the prior review; non-blocking.

dpoulopoulos · 2026-06-22T08:06:08Z

+    # Memory recall (platform mode, best-effort). Inject recalled facts into the system
+    # message before dispatch so every downstream path (stream/non-stream, tool/no-tool)
+    # sees them. A failure or a disabled workspace yields no facts and never blocks chat.
+    memory_user_token = ctx.user_token if (config.memory_enabled and ctx.platform_mode) else None


[minor] No integration test exercises the wired memory flow in chat_completions. The unit tests cover the helpers, the platform client, and the streaming-capture wrapper in isolation, but the actual wiring here, gating on config.memory_enabled and ctx.platform_mode, recall then inject_memory_facts, the best-effort try/except, _schedule_memory_remember on the non-streaming path (line 474), and on_memory_settled on the streaming path (line 405), is untested end to end. An integration test in tests/integration/test_platform_mode_chat.py (memory off = no platform memory calls; memory on = recall injected and remember fired) would lock in this behavior and catch wiring regressions the unit tests can't see.

dpoulopoulos · 2026-06-22T08:06:08Z

+    out = list(messages)
+    if out and isinstance(out[0], dict) and out[0].get("role") == "system":
+        existing = out[0].get("content") or ""
+        out[0] = {**out[0], "content": f"{existing}\n\n{block}" if existing else block}


[nit] When the first message is a system message whose content is a list of content parts (a multimodal system prompt), f"{existing}\n\n{block}" stringifies the list via its Python repr, corrupting the system content. System content is almost always a plain string, and this exactly mirrors the existing inject_purpose_hints helper (which has the same edge case), so it's low-risk and consistent, just flagging the shared gap. Carried over from the prior review.

dpoulopoulos · 2026-06-22T08:06:08Z

+        completed = True
+    finally:
+        if completed and parts:
+            task = asyncio.create_task(on_settled_text("".join(parts)))


[nit] asyncio.create_task(...) keeps no strong reference; only add_done_callback is attached. Per the asyncio docs the event loop holds only a weak reference to tasks, so a detached task can in theory be garbage-collected before it finishes. In practice the remember runs on the next loop tick and this mirrors the existing _report_platform_usage create_task pattern, so it's consistent with the codebase and low-risk. The exception-leak concern from the prior review is fully resolved by _swallow_memory_task_exception; this is just the remaining GC subtlety. Optional: hold the task in a set and discard it in the done-callback.

A master switch (default off) for persistent memory in platform mode. When off, the gateway makes no memory calls and behavior is unchanged; per-workspace enablement is still controlled by the platform. Refs #189 Signed-off-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com>

…pers _recall_platform_memory / _remember_platform_memory call the platform's /gateway/memory/* endpoints, modeled on the usage reporter: they swallow timeouts and errors so a slow or unavailable memory service never breaks a chat. inject_memory_facts prepends recalled facts to the system message (mirroring inject_purpose_hints); build_remember_messages assembles the minimal new turn to store. Helpers are unit-tested and not yet wired into the request path. Refs #189 Signed-off-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com>

Gated by OTARI_MEMORY_ENABLED in platform mode: recall runs once before dispatch and injects facts into the system message (covering every path, streaming and not), and the new turn is stored fire-and-forget afterwards. Non-streaming uses a background task; streaming uses _stream_with_memory_capture to accumulate the assistant text and remember only on normal completion (skipped on disconnect). The capture wrapper is unit-tested. No behavior change when the flag is off. Closes #189 Signed-off-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com>

dpoulopoulos · 2026-06-22T08:11:41Z

Pushed fixups for the second-round feedback (folded into their original commits):

Surfaced the memory latency budget. The recall/remember timeouts were only readable from the platform: dict with no operator-visible knob. Added PLATFORM_MEMORY_RECALL_TIMEOUT_MS / PLATFORM_MEMORY_REMEMBER_TIMEOUT_MS env mappings (consistent with the existing PLATFORM_*_TIMEOUT_MS siblings) and expanded the memory_enabled field description to spell out that recall runs before dispatch (so it adds up to its timeout to TTFT) while remember is fire-and-forget, including the defaults (2000ms / 10000ms). Extended test_load_config_platform_env_overrides to lock in the two new mappings.
Added end-to-end integration coverage of the wiring. Two tests in test_platform_mode_chat.py: memory off makes zero /gateway/memory/* calls, and memory on recalls with the latest user text, injects the fact into the system message the provider sees, then stores the user+assistant turn after the non-streaming completion.

Left as-is (both flagged low-risk/by-design in review):

The multimodal-system-content repr edge case in inject_memory_facts mirrors the existing inject_purpose_hints helper; system content is effectively always a string, so changing only this helper would diverge from that pattern.
The streaming create_task GC subtlety mirrors the existing _report_platform_usage pattern; the exception-leak concern is already handled by the done-callback, and the task runs on the next loop tick.

ruff, mypy (strict), and the memory + config test suites are green.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/gateway/core/config.py`:
- Around line 410-415: The PLATFORM_MEMORY_RECALL_TIMEOUT_MS and
PLATFORM_MEMORY_REMEMBER_TIMEOUT_MS configuration entries are registered with
int casting, which causes invalid environment values to raise an exception
during config load and prevent the best-effort fallback behavior in
_coerce_timeout_ms from working. Change the type casting for these two entries
from int to str to defer validation, allowing the configuration to load
successfully and let _coerce_timeout_ms handle the validation and fallback to
defaults when encountering non-numeric values.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 17d7ddae-5fb9-4684-b2c5-6dfddb10ecd6

📥 Commits

Reviewing files that changed from the base of the PR and between 0aee46f and 7c5fb36.

📒 Files selected for processing (10)

src/gateway/api/routes/_helpers.py
src/gateway/api/routes/_pipeline.py
src/gateway/api/routes/_platform.py
src/gateway/api/routes/chat.py
src/gateway/core/config.py
tests/integration/test_config_env_loading.py
tests/integration/test_platform_mode_chat.py
tests/unit/test_memory_inject.py
tests/unit/test_memory_platform.py
tests/unit/test_stream_memory_capture.py

🚧 Files skipped from review as they are similar to previous changes (6)

tests/unit/test_memory_platform.py
tests/unit/test_memory_inject.py
src/gateway/api/routes/_helpers.py
src/gateway/api/routes/_platform.py
src/gateway/api/routes/chat.py
src/gateway/api/routes/_pipeline.py

coderabbitai · 2026-06-22T08:17:12Z

+        # Best-effort memory timeouts (platform mode). Recall is on the hot path
+        # (added to time-to-first-token); remember is fire-and-forget after the
+        # completion. Both fall back to their defaults on a missing or invalid
+        # value (see _coerce_timeout_ms in routes/_platform.py).
+        "PLATFORM_MEMORY_RECALL_TIMEOUT_MS": ("memory_recall_timeout_ms", int),
+        "PLATFORM_MEMORY_REMEMBER_TIMEOUT_MS": ("memory_remember_timeout_ms", int),


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle invalid PLATFORM_MEMORY_*_TIMEOUT_MS values without crashing startup.

These vars are registered with int casting here, so a non-numeric env value raises during config load (Line 431) and prevents startup. That bypasses the intended best-effort fallback behavior in _coerce_timeout_ms(...).

Suggested fix

@@ - for env_name, (field_name, caster) in env_mappings.items(): + memory_timeout_envs = { + "PLATFORM_MEMORY_RECALL_TIMEOUT_MS", + "PLATFORM_MEMORY_REMEMBER_TIMEOUT_MS", + } + for env_name, (field_name, caster) in env_mappings.items(): value = os.getenv(env_name) if value is None or value == "": continue - platform[field_name] = caster(value) + try: + platform[field_name] = caster(value) + except ValueError: + if env_name in memory_timeout_envs: + # Let downstream _coerce_timeout_ms apply defaults. + platform[field_name] = value + continue + raise

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/gateway/core/config.py` around lines 410 - 415, The PLATFORM_MEMORY_RECALL_TIMEOUT_MS and PLATFORM_MEMORY_REMEMBER_TIMEOUT_MS configuration entries are registered with int casting, which causes invalid environment values to raise an exception during config load and prevent the best-effort fallback behavior in _coerce_timeout_ms from working. Change the type casting for these two entries from int to str to defer validation, allowing the configuration to load successfully and let _coerce_timeout_ms handle the validation and fallback to defaults when encountering non-numeric values.

dpoulopoulos temporarily deployed to integration-tests June 22, 2026 07:08 — with GitHub Actions Inactive

github-actions Bot added the missing-template PR is missing required template sections label Jun 22, 2026

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread src/gateway/api/routes/_pipeline.py Outdated

Comment thread src/gateway/api/routes/_platform.py Outdated

Comment thread src/gateway/api/routes/chat.py Outdated

github-actions Bot removed the missing-template PR is missing required template sections label Jun 22, 2026

dpoulopoulos commented Jun 22, 2026

View reviewed changes

Comment thread src/gateway/api/routes/_platform.py Outdated

Comment thread src/gateway/api/routes/_pipeline.py Outdated

Comment thread src/gateway/api/routes/chat.py Outdated

Comment thread src/gateway/api/routes/_helpers.py

Comment thread tests/unit/test_memory_platform.py

dpoulopoulos force-pushed the feat/memory branch from 0aee46f to f98dbd5 Compare June 22, 2026 07:40

dpoulopoulos temporarily deployed to integration-tests June 22, 2026 07:40 — with GitHub Actions Inactive

dpoulopoulos commented Jun 22, 2026

View reviewed changes

dpoulopoulos added 3 commits June 22, 2026 08:10

dpoulopoulos force-pushed the feat/memory branch from f98dbd5 to 7c5fb36 Compare June 22, 2026 08:11

dpoulopoulos temporarily deployed to integration-tests June 22, 2026 08:11 — with GitHub Actions Inactive

coderabbitai Bot requested review from agpituk and tbille June 22, 2026 08:12

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

Conversation

dpoulopoulos commented Jun 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

PR Type

Relevant issues

Checklist

AI Usage

Overview

What Changed

Benefits

Testing

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dpoulopoulos left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dpoulopoulos commented Jun 22, 2026

Uh oh!

dpoulopoulos left a comment

Choose a reason for hiding this comment

Uh oh!

dpoulopoulos Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

dpoulopoulos Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

dpoulopoulos Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

dpoulopoulos Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

dpoulopoulos commented Jun 22, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dpoulopoulos commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading