Skip to content

feat(memory): recall and store memory around chat completions#190

Open
dpoulopoulos wants to merge 3 commits into
mainfrom
feat/memory
Open

feat(memory): recall and store memory around chat completions#190
dpoulopoulos wants to merge 3 commits into
mainfrom
feat/memory

Conversation

@dpoulopoulos

@dpoulopoulos dpoulopoulos commented Jun 22, 2026

Copy link
Copy Markdown
Member

Description

Wires persistent memory into chat completions in platform mode: the gateway recalls relevant remembered context before the model answers and stores durable new facts afterward. It is off by default (OTARI_MEMORY_ENABLED), best-effort, and covers both streaming and non-streaming completions.

The point is that assistants built on the gateway feel continuous across sessions instead of stateless. The platform owns configuration, storage, privacy, and accounting (companion PR mozilla-ai/otari-ai#1148); the gateway just brackets each completion with recall + remember.

What's in it:

  • OTARI_MEMORY_ENABLED master switch (default off → no behavior change for existing deployments).
  • A best-effort platform memory client + prompt-injection helpers that swallow errors/timeouts, so memory never breaks or noticeably slows a chat.
  • Recall before dispatch (covers every path) and a fire-and-forget remember afterward, for both non-streaming and streaming completions (the streaming path accumulates the assistant text and stores only on normal completion).

No memory behavior or added latency in standalone mode.

PR Type

  • New Feature
  • Bug Fix
  • Refactor
  • Documentation
  • Infrastructure / CI

Relevant issues

Closes #189

Checklist

  • I understand the code I am submitting.
  • I have added or updated tests that cover my change (tests/unit, tests/integration).
  • I ran the Definition of Done checks locally (make lint, make typecheck, make test).
  • Documentation was updated where necessary.
  • If the API contract changed, I regenerated the OpenAPI spec (uv run python scripts/generate_openapi.py). (No contract change: memory is internal, the spec is unchanged.)

AI Usage

  • No AI was used.
  • AI was used for drafting/refactoring.
  • This is fully AI-generated.

AI Model/Tool used: Claude Code (Claude Opus 4.8)

Any additional AI details you'd like to share:
The memory integration was implemented end-to-end with Claude Code; the human author directed the work and reviews the result.

  • I am an AI Agent filling out this form (check box if true)

Overview

This pull request adds persistent memory functionality to the gateway's chat completion endpoints when operating in platform mode. The feature enables assistants to recall relevant facts from prior conversations and store new information for future sessions, improving the user experience by eliminating the need to repeat preferences and context across separate conversations.

What Changed

New Capability: Persistent Conversation Memory

  • Assistants can now recall relevant facts from previous conversations before answering user queries
  • The gateway automatically captures and stores new facts after each conversation
  • Stored information persists across session breaks, enabling contextual awareness across multiple conversations

Configuration & Control

  • Added master switch OTARI_MEMORY_ENABLED to control the feature globally (defaults to off)
  • Added configurable timeout thresholds: PLATFORM_MEMORY_RECALL_TIMEOUT_MS (2000ms default) and PLATFORM_MEMORY_REMEMBER_TIMEOUT_MS (10000ms default)
  • Memory is exclusively a platform-mode feature; standalone deployments are completely unaffected

Implementation Details

  • Recall operations execute before sending requests to the underlying model, adding up to the recall timeout to time-to-first-token only when memory service experiences degradation
  • Storage operations run asynchronously as fire-and-forget tasks for non-streaming completions
  • Streaming completions accumulate assistant text during transmission and store it upon normal completion
  • All memory operations are best-effort: timeouts, errors, and service unavailability are gracefully handled with fallback to stateless behavior

Code Changes

  • Memory injection helpers now integrate recalled facts into system prompts
  • Chat completion endpoints enhanced to recall facts before dispatch and store facts after completion
  • Streaming pipeline updated to capture assistant text and trigger memory storage upon completion
  • Configuration system extended with memory-related settings and timeout parsing

Benefits

  • Improved User Experience: Assistants retain context about user preferences and information across conversations without requiring repetition
  • Zero Impact on Existing Deployments: Feature is opt-in; existing systems see no changes, no latency impact, and no configuration overhead unless explicitly enabled
  • Robust Degradation: Memory failures never break chat completions or cause noticeable latency; the gateway operates identically to today's behavior when memory is unavailable
  • Comprehensive Coverage: Works with both streaming and non-streaming completion paths

Testing

Comprehensive unit and integration tests verify:

  • Memory helpers correctly inject and extract facts
  • Best-effort error handling swallows all failure modes without disrupting completions
  • Configuration timeouts are parsed robustly with sensible fallbacks
  • Integration tests confirm memory off produces zero memory API calls, while memory on correctly recalls, injects, and stores information

@dpoulopoulos dpoulopoulos temporarily deployed to integration-tests June 22, 2026 07:08 — with GitHub Actions Inactive
@github-actions github-actions Bot added the missing-template PR is missing required template sections label Jun 22, 2026
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

Adds opt-in persistent memory to the gateway's platform-mode chat completions path. A new memory_enabled config flag gates the feature. Message helpers inject recalled facts into prompts and format turns for storage. Two platform API clients handle best-effort recall and remember calls with configurable timeouts. A streaming wrapper captures assistant text across chunks and fires a background persist task on normal completion. Both streaming and non-streaming paths in chat_completions are wired to recall before dispatch and remember after, with comprehensive unit and integration test coverage.

Changes

Platform Memory Integration

Layer / File(s) Summary
memory_enabled configuration and environment overrides
src/gateway/core/config.py, tests/integration/test_config_env_loading.py
Adds GatewayConfig.memory_enabled boolean field (default=False) as the gateway-level master switch for recall/remember behavior in platform mode. Platform environment variables PLATFORM_MEMORY_RECALL_TIMEOUT_MS and PLATFORM_MEMORY_REMEMBER_TIMEOUT_MS are parsed as integers and stored for timeout configuration. Integration tests verify both config fields and environment variable loading.
Message manipulation helpers
src/gateway/api/routes/_helpers.py, tests/unit/test_memory_inject.py
Adds MEMORY_FACTS_HEADER constant, inject_memory_facts (prepends/appends a memory facts block to the message list without mutating the input), and build_remember_messages (constructs a minimal user+assistant message pair for storage). Unit tests cover no-op behavior with empty facts, system message creation vs. append, mutation safety, and all combinations of user/assistant text presence.
Platform recall/remember API clients
src/gateway/api/routes/_platform.py, tests/unit/test_memory_platform.py
Adds _coerce_timeout_ms helper, _recall_platform_memory (POST to /gateway/memory/recall, returns filtered list[str] or [] on any error condition), and _remember_platform_memory (fire-and-forget POST to /gateway/memory/remember, silently swallows all errors). Unit tests cover successful recalls/remembers, early short-circuit conditions (missing base URL, empty inputs), non-200 responses, timeouts, malformed JSON, and all httpx error modes.
Streaming memory-capture wrapper
src/gateway/api/routes/_pipeline.py, tests/unit/test_stream_memory_capture.py
Extends run_streaming_with_fallback with extract_stream_text and on_memory_settled keyword callbacks. Adds _swallow_memory_task_exception helper and _stream_with_memory_capture, which accumulates extracted text across chunks, yields original chunks unchanged, and schedules the settled callback as a detached background task only on normal stream completion—not on mid-stream aclose() disconnection. Unit tests verify pass-through behavior, normal-completion callback invocation, early disconnection (no callback), all-None chunks (no callback), exception swallowing, and chunk text extraction rules.
Chat route memory integration
src/gateway/api/routes/chat.py, tests/integration/test_platform_mode_chat.py
Adds three internal helpers (_schedule_memory_remember, _chat_chunk_text, _streaming_memory_remember). Wires recall+inject before dispatch in platform mode (when config.memory_enabled and ctx.platform_mode are both true), passes capture callbacks into the streaming fallback for streamed completions, and schedules a background persist task after non-streaming completions. All memory operations are best-effort; failures are logged and do not affect completion flow. Integration tests verify memory is disabled by default (no platform memory calls) and that memory-enabled mode correctly recalls, injects, and remembers across streaming and non-streaming paths.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • mozilla-ai/otari#130: Both PRs modify src/gateway/api/routes/_pipeline.py's streaming/fallback infrastructure—specifically run_streaming_with_fallback(...) signature and behavior—so they share the same code seam and may need sequencing or conflict resolution.

Suggested reviewers

  • tbille
  • agpituk
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.68% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title uses a valid Conventional Commit prefix (feat), follows imperative mood, is under 70 characters (61 chars), and clearly summarizes the main change: adding memory recall and storage to chat completions.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering all template sections including clear description, PR type selection, linked issue reference, completed checklist items, and AI usage disclosure.
Linked Issues check ✅ Passed All coding requirements from issue #189 are met: memory is opt-in with OTARI_MEMORY_ENABLED (off by default), best-effort with error handling throughout, covers both streaming and non-streaming paths, platform-mode only, and includes comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes directly support the memory feature scope: config additions for memory flags/timeouts, helpers for prompt injection and platform memory client calls, chat endpoint wiring, streaming wrapper, and comprehensive unit/integration tests—no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/memory
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/memory

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/unit/test_memory_platform.py (1)

25-80: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add regression tests for malformed timeout config values.

Please add tests where memory_recall_timeout_ms / memory_remember_timeout_ms are non-numeric, so best-effort semantics stay protected against config drift.

Also applies to: 86-120

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_memory_platform.py` around lines 25 - 80, Add new regression
test cases to verify that non-numeric values for `memory_recall_timeout_ms` and
`memory_remember_timeout_ms` configuration parameters are handled gracefully.
Create test functions that pass malformed timeout values (such as strings or
None) through the _config() helper function and verify that the
_recall_platform_memory function still executes successfully without raising
exceptions, following the same pattern as the existing test functions like
test_recall_non_200_yields_no_facts and test_recall_timeout_is_swallowed to
ensure best-effort semantics are protected against configuration drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/gateway/api/routes/_pipeline.py`:
- Around line 1461-1462: The asyncio.create_task call for on_settled_text is
fire-and-forget and lacks exception handling, which causes unhandled task
exceptions to leak runtime warnings. Add a done callback to the task created by
asyncio.create_task that suppresses any exceptions raised by on_settled_text,
ensuring memory persistence failures remain silent and non-disruptive. Reference
the asyncio.create_task line with on_settled_text and add a callback using
add_done_callback that catches and silently ignores any exceptions from the
completed task.

In `@src/gateway/api/routes/_platform.py`:
- Around line 602-603: The timeout parsing on line 602 and line 639 using
int(config.platform.get("memory_recall_timeout_ms", 2000)) can raise ValueError
exceptions if the config value is not a valid integer, breaking the "never block
chat" contract. For both occurrences where memory_recall_timeout_ms is parsed,
wrap the int() conversion in a try-except block to safely handle invalid input,
catching ValueError and returning the default fallback value of 2000 when
parsing fails. This ensures misconfiguration does not cause chat functionality
to break.

In `@src/gateway/api/routes/chat.py`:
- Around line 361-364: The recall operation in the memory check block lacks
error handling, which means any exception raised by _recall_platform_memory
(such as invalid timeout parsing in its config handling) will cause the entire
request to fail. Wrap the _recall_platform_memory call and the subsequent
inject_memory_facts call in a try-except block so that any exceptions are caught
and logged, allowing the request to proceed without memory injection rather than
breaking the completion entirely. This ensures memory recall failures remain
best-effort and do not disrupt provider dispatch.

---

Nitpick comments:
In `@tests/unit/test_memory_platform.py`:
- Around line 25-80: Add new regression test cases to verify that non-numeric
values for `memory_recall_timeout_ms` and `memory_remember_timeout_ms`
configuration parameters are handled gracefully. Create test functions that pass
malformed timeout values (such as strings or None) through the _config() helper
function and verify that the _recall_platform_memory function still executes
successfully without raising exceptions, following the same pattern as the
existing test functions like test_recall_non_200_yields_no_facts and
test_recall_timeout_is_swallowed to ensure best-effort semantics are protected
against configuration drift.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eee16f2c-1ddb-4b1b-a076-e50f57f07112

📥 Commits

Reviewing files that changed from the base of the PR and between a382c06 and 0aee46f.

📒 Files selected for processing (8)
  • src/gateway/api/routes/_helpers.py
  • src/gateway/api/routes/_pipeline.py
  • src/gateway/api/routes/_platform.py
  • src/gateway/api/routes/chat.py
  • src/gateway/core/config.py
  • tests/unit/test_memory_inject.py
  • tests/unit/test_memory_platform.py
  • tests/unit/test_stream_memory_capture.py

Comment thread src/gateway/api/routes/_pipeline.py Outdated
Comment thread src/gateway/api/routes/_platform.py Outdated
Comment thread src/gateway/api/routes/chat.py Outdated
@github-actions github-actions Bot removed the missing-template PR is missing required template sections label Jun 22, 2026

@dpoulopoulos dpoulopoulos left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds opt-in persistent memory to platform-mode chat completions: a new memory_enabled flag (off by default) gates a best-effort recall-before-dispatch + remember-after flow. Message helpers inject recalled facts into the system prompt and build the turn to store; two platform clients (_recall_platform_memory, _remember_platform_memory) talk to /gateway/memory/{recall,remember}; a streaming wrapper accumulates assistant text and fires a fire-and-forget remember on normal completion. Both streaming and non-streaming paths are wired. Standalone mode is unaffected.

Comment thread src/gateway/api/routes/_platform.py Outdated
Comment thread src/gateway/api/routes/_pipeline.py Outdated
Comment thread src/gateway/api/routes/chat.py Outdated
Comment thread src/gateway/api/routes/_helpers.py
Comment thread tests/unit/test_memory_platform.py
@dpoulopoulos

Copy link
Copy Markdown
Member Author

Thanks for the careful review. Pushed fixups addressing the best-effort gaps:

  • Timeout parsing no longer uses a bare int(...). Added a _coerce_timeout_ms helper that falls back to the default for missing, non-numeric, or non-positive values, used by both recall and remember.
  • Broadened the recall/remember error handling from (TimeoutException, NetworkError) to httpx.HTTPError, so the long tail (RemoteProtocolError, ProxyError, InvalidURL, etc.) can't turn a would-succeed chat into a 500.
  • Wrapped the recall call site in chat.py in a try/except that logs and proceeds without facts, as a belt-and-suspenders boundary.
  • The streaming remember task now keeps its reference and attaches a done-callback that retrieves and logs any exception, so a failed background write doesn't leak a "Task exception was never retrieved" warning.
  • Added regression tests: malformed timeout values (recall + remember) asserting the default fallback, a RemoteProtocolError swallowed on both paths, and a raising streamed settled-callback that doesn't propagate.

Two points I left as-is, with reasoning:

  • Recall-before-dispatch adds up to the recall timeout to TTFT under a degraded memory service. That's inherent (facts must be in the prompt before the call) and already bounded by a configurable timeout, so no change beyond the safer parsing above.
  • The multimodal-system-content repr edge case in inject_memory_facts mirrors the existing inject_purpose_hints helper; system content is effectively always a string, and changing only this helper would diverge from that pattern.

ruff, mypy (strict), and the full unit suite are green.

@dpoulopoulos dpoulopoulos left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds opt-in, best-effort persistent memory to platform-mode chat completions, gated by a new memory_enabled config flag (OTARI_MEMORY_ENABLED, default off). Before dispatch the gateway recalls facts from the platform (/gateway/memory/recall) and injects them into the system prompt; after a completion it stores the new turn fire-and-forget (/gateway/memory/remember) for both non-streaming (background task) and streaming (a capture wrapper that accumulates assistant text and persists only on normal completion). Standalone mode and the /messages and /responses endpoints are unaffected. This is a re-review after the author addressed the previous round of feedback.

Prior feedback status: resolved. All actionable items from the previous review (CodeRabbit's 3 plus the prior [major]/[minor] review's 5) are addressed in the current code: the narrow (TimeoutException, NetworkError) catch was broadened to httpx.HTTPError in both recall and remember; a _coerce_timeout_ms helper guards timeout parsing; the recall call site in chat.py is wrapped in a best-effort try/except; the streaming create_task now has an exception-swallowing done-callback; and regression tests for malformed timeouts, generic httpx errors, and a raising settled-callback were added. Two low-priority items the prior review explicitly flagged as by-design/low-risk remain open (recall latency, multimodal-system-content edge case). Verified: ruff, mypy (strict), and all 32 new unit tests pass. No web UI exists for this backend feature, so Playwright verification is not applicable.

memory_user_token = ctx.user_token if (config.memory_enabled and ctx.platform_mode) else None
if memory_user_token:
try:
recalled = await _recall_platform_memory(config, memory_user_token, latest_user_text(request.messages))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Recall is awaited synchronously before dispatch, so under a slow/degraded memory service it adds up to memory_recall_timeout_ms (default 2000ms) to time-to-first-token on every memory-enabled platform request. This is inherent to recall-before-dispatch (the facts must be in the prompt before the call) and is a reasonable trade-off, but the operator-facing latency budget isn't documented anywhere: memory_recall_timeout_ms / memory_remember_timeout_ms are read from the config.platform dict but aren't surfaced in the memory_enabled field description, config.example.yml, or the docs. Worth documenting the latency budget (and perhaps a more aggressive recall default than 2s) so operators understand the cost of enabling it. Carried over from the prior review; non-blocking.

# Memory recall (platform mode, best-effort). Inject recalled facts into the system
# message before dispatch so every downstream path (stream/non-stream, tool/no-tool)
# sees them. A failure or a disabled workspace yields no facts and never blocks chat.
memory_user_token = ctx.user_token if (config.memory_enabled and ctx.platform_mode) else None

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] No integration test exercises the wired memory flow in chat_completions. The unit tests cover the helpers, the platform client, and the streaming-capture wrapper in isolation, but the actual wiring here, gating on config.memory_enabled and ctx.platform_mode, recall then inject_memory_facts, the best-effort try/except, _schedule_memory_remember on the non-streaming path (line 474), and on_memory_settled on the streaming path (line 405), is untested end to end. An integration test in tests/integration/test_platform_mode_chat.py (memory off = no platform memory calls; memory on = recall injected and remember fired) would lock in this behavior and catch wiring regressions the unit tests can't see.

out = list(messages)
if out and isinstance(out[0], dict) and out[0].get("role") == "system":
existing = out[0].get("content") or ""
out[0] = {**out[0], "content": f"{existing}\n\n{block}" if existing else block}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] When the first message is a system message whose content is a list of content parts (a multimodal system prompt), f"{existing}\n\n{block}" stringifies the list via its Python repr, corrupting the system content. System content is almost always a plain string, and this exactly mirrors the existing inject_purpose_hints helper (which has the same edge case), so it's low-risk and consistent, just flagging the shared gap. Carried over from the prior review.

completed = True
finally:
if completed and parts:
task = asyncio.create_task(on_settled_text("".join(parts)))

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] asyncio.create_task(...) keeps no strong reference; only add_done_callback is attached. Per the asyncio docs the event loop holds only a weak reference to tasks, so a detached task can in theory be garbage-collected before it finishes. In practice the remember runs on the next loop tick and this mirrors the existing _report_platform_usage create_task pattern, so it's consistent with the codebase and low-risk. The exception-leak concern from the prior review is fully resolved by _swallow_memory_task_exception; this is just the remaining GC subtlety. Optional: hold the task in a set and discard it in the done-callback.

A master switch (default off) for persistent memory in platform mode. When off,
the gateway makes no memory calls and behavior is unchanged; per-workspace
enablement is still controlled by the platform.

Refs #189

Signed-off-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com>
…pers

_recall_platform_memory / _remember_platform_memory call the platform's
/gateway/memory/* endpoints, modeled on the usage reporter: they swallow
timeouts and errors so a slow or unavailable memory service never breaks a chat.
inject_memory_facts prepends recalled facts to the system message (mirroring
inject_purpose_hints); build_remember_messages assembles the minimal new turn to
store. Helpers are unit-tested and not yet wired into the request path.

Refs #189

Signed-off-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com>
Gated by OTARI_MEMORY_ENABLED in platform mode: recall runs once before dispatch
and injects facts into the system message (covering every path, streaming and
not), and the new turn is stored fire-and-forget afterwards. Non-streaming uses
a background task; streaming uses _stream_with_memory_capture to accumulate the
assistant text and remember only on normal completion (skipped on disconnect).
The capture wrapper is unit-tested. No behavior change when the flag is off.

Closes #189

Signed-off-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com>
@dpoulopoulos

Copy link
Copy Markdown
Member Author

Pushed fixups for the second-round feedback (folded into their original commits):

  • Surfaced the memory latency budget. The recall/remember timeouts were only readable from the platform: dict with no operator-visible knob. Added PLATFORM_MEMORY_RECALL_TIMEOUT_MS / PLATFORM_MEMORY_REMEMBER_TIMEOUT_MS env mappings (consistent with the existing PLATFORM_*_TIMEOUT_MS siblings) and expanded the memory_enabled field description to spell out that recall runs before dispatch (so it adds up to its timeout to TTFT) while remember is fire-and-forget, including the defaults (2000ms / 10000ms). Extended test_load_config_platform_env_overrides to lock in the two new mappings.
  • Added end-to-end integration coverage of the wiring. Two tests in test_platform_mode_chat.py: memory off makes zero /gateway/memory/* calls, and memory on recalls with the latest user text, injects the fact into the system message the provider sees, then stores the user+assistant turn after the non-streaming completion.

Left as-is (both flagged low-risk/by-design in review):

  • The multimodal-system-content repr edge case in inject_memory_facts mirrors the existing inject_purpose_hints helper; system content is effectively always a string, so changing only this helper would diverge from that pattern.
  • The streaming create_task GC subtlety mirrors the existing _report_platform_usage pattern; the exception-leak concern is already handled by the done-callback, and the task runs on the next loop tick.

ruff, mypy (strict), and the memory + config test suites are green.

@coderabbitai coderabbitai Bot requested review from agpituk and tbille June 22, 2026 08:12

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/gateway/core/config.py`:
- Around line 410-415: The PLATFORM_MEMORY_RECALL_TIMEOUT_MS and
PLATFORM_MEMORY_REMEMBER_TIMEOUT_MS configuration entries are registered with
int casting, which causes invalid environment values to raise an exception
during config load and prevent the best-effort fallback behavior in
_coerce_timeout_ms from working. Change the type casting for these two entries
from int to str to defer validation, allowing the configuration to load
successfully and let _coerce_timeout_ms handle the validation and fallback to
defaults when encountering non-numeric values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 17d7ddae-5fb9-4684-b2c5-6dfddb10ecd6

📥 Commits

Reviewing files that changed from the base of the PR and between 0aee46f and 7c5fb36.

📒 Files selected for processing (10)
  • src/gateway/api/routes/_helpers.py
  • src/gateway/api/routes/_pipeline.py
  • src/gateway/api/routes/_platform.py
  • src/gateway/api/routes/chat.py
  • src/gateway/core/config.py
  • tests/integration/test_config_env_loading.py
  • tests/integration/test_platform_mode_chat.py
  • tests/unit/test_memory_inject.py
  • tests/unit/test_memory_platform.py
  • tests/unit/test_stream_memory_capture.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • tests/unit/test_memory_platform.py
  • tests/unit/test_memory_inject.py
  • src/gateway/api/routes/_helpers.py
  • src/gateway/api/routes/_platform.py
  • src/gateway/api/routes/chat.py
  • src/gateway/api/routes/_pipeline.py

Comment on lines +410 to +415
# Best-effort memory timeouts (platform mode). Recall is on the hot path
# (added to time-to-first-token); remember is fire-and-forget after the
# completion. Both fall back to their defaults on a missing or invalid
# value (see _coerce_timeout_ms in routes/_platform.py).
"PLATFORM_MEMORY_RECALL_TIMEOUT_MS": ("memory_recall_timeout_ms", int),
"PLATFORM_MEMORY_REMEMBER_TIMEOUT_MS": ("memory_remember_timeout_ms", int),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle invalid PLATFORM_MEMORY_*_TIMEOUT_MS values without crashing startup.

These vars are registered with int casting here, so a non-numeric env value raises during config load (Line 431) and prevents startup. That bypasses the intended best-effort fallback behavior in _coerce_timeout_ms(...).

Suggested fix
@@
-    for env_name, (field_name, caster) in env_mappings.items():
+    memory_timeout_envs = {
+        "PLATFORM_MEMORY_RECALL_TIMEOUT_MS",
+        "PLATFORM_MEMORY_REMEMBER_TIMEOUT_MS",
+    }
+    for env_name, (field_name, caster) in env_mappings.items():
         value = os.getenv(env_name)
         if value is None or value == "":
             continue
-        platform[field_name] = caster(value)
+        try:
+            platform[field_name] = caster(value)
+        except ValueError:
+            if env_name in memory_timeout_envs:
+                # Let downstream _coerce_timeout_ms apply defaults.
+                platform[field_name] = value
+                continue
+            raise
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/gateway/core/config.py` around lines 410 - 415, The
PLATFORM_MEMORY_RECALL_TIMEOUT_MS and PLATFORM_MEMORY_REMEMBER_TIMEOUT_MS
configuration entries are registered with int casting, which causes invalid
environment values to raise an exception during config load and prevent the
best-effort fallback behavior in _coerce_timeout_ms from working. Change the
type casting for these two entries from int to str to defer validation, allowing
the configuration to load successfully and let _coerce_timeout_ms handle the
validation and fallback to defaults when encountering non-numeric values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Persistent memory for chat completions (platform mode)

1 participant