feat(sandbox): consume the platform code-execution resolve (apply purpose hint + max iterations) by agpituk · Pull Request #184 · mozilla-ai/otari

agpituk · 2026-06-17T13:44:29Z

Description

Consumer half of the per-workspace code-execution resolve channel (the otari-ai side is mozilla-ai/otari-ai#991).

When a request opts into otari_code_execution in platform mode, prepare_gateway_tools now resolves the workspace's code-exec policy from the platform (POST /gateway/code-execution/resolve, mirroring the web-search resolve):

403 if the workspace has code execution disabled (gateway-side gate, before calling the model);
applies the workspace default_purpose_hint to the otari_code_execution tool surface (a per-request purpose_hint still wins);
bounds the tool loop by the workspace max_iterations (folded into the existing iteration cap).

This is what makes default_purpose_hint/max_iterations actually reach the model — previously they were configurable but never delivered. The platform clamps the soft limits to operator ceilings at resolve time, and the /v1/sandbox proxy still re-enforces enabled + the tools allow-list + the exec timeout.

Adds _resolve_platform_code_execution (mirrors _resolve_platform_web_search: same base-url guard, timeout, headers, status-code handling).

PR Type

Relevant issues

Pairs with mozilla-ai/otari-ai#991 (the resolve endpoint + per-workspace config).

Checklist

I understand the code I am submitting.
I have added or updated tests that cover my change (tests/unit/test_code_execution_resolve.py).
I ran the Definition of Done checks locally (make lint, make typecheck, make test). — make lint + make typecheck pass; full tests/unit passes (441); container-backed integration deferred to CI.
Documentation was updated where necessary. — none required (internal platform-resolve call; no user-facing behavior beyond the policy now applying).
If the API contract changed, I regenerated the OpenAPI spec. — N/A, the gateway's own API contract is unchanged (this consumes a platform endpoint; it adds none).

AI Usage

No AI was used.
AI was used for drafting/refactoring.
This is fully AI-generated.

AI Model/Tool used:
Claude (Opus 4.8) via Claude Code.

Any additional AI details you'd like to share:
Implemented by mirroring the existing web-search resolve consumer, with unit tests mirroring test_web_search_resolve. Verified make lint/make typecheck and the full unit suite locally. Reviewer questions will be answered by a human.

I am an AI Agent filling out this form (check box if true)

Summary

This PR extends the gateway’s platform-mode support so workspaces can centrally configure whether (and how) code execution is allowed. Before the gateway proceeds with sandbox/tool execution, it now fetches the workspace’s code-execution policy from the platform and enforces it consistently.

What Changed

Gateway pipeline (src/gateway/api/routes/_pipeline.py):

Added a platform policy fetch step for otari_code_execution during tool preparation in platform mode
Rejects sandbox/code execution with 403 when the workspace policy says code execution is disabled
Applies a workspace-level default purpose_hint, while allowing a per-request purpose_hint to override it
Caps sandbox/tool loop iterations using the workspace-configured maximum
Improves “fail closed” validation: malformed policy values now cause the request to fail instead of being treated as disabled
Introduces shared URL scoping helper url_targets_platform (renaming the previous web-search-specific helper) and uses it for both sandbox and web-search platform credential forwarding checks
Adds new client-visible error detail constants for the sandbox-not-enabled and malformed-policy cases

Platform resolver (src/gateway/api/routes/_platform.py):

Added _resolve_platform_code_execution() to call the platform endpoint (/gateway/code-execution/resolve) and map responses/timeouts to gateway errors consistently with existing platform resolvers

Tests:

Added unit tests for the platform resolver’s success path and error/status handling (tests/unit/test_code_execution_resolve.py)
Added integration tests for platform-mode sandbox behavior, including:
- 403 when disabled
- applying default_purpose_hint
- per-request purpose_hint precedence (tests/integration/test_platform_mode_chat.py)
Updated related web-search URL scoping tests to use url_targets_platform

Benefits

Centralized, per-workspace control over code execution in platform mode
Stronger safety: malformed platform policy no longer degrades into silently-allowed behavior
Clear defaults with predictable override behavior for purpose hints

coderabbitai · 2026-06-17T13:44:54Z

Warning

Review limit reached

@agpituk, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 11 minutes and 55 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 974a027c-859c-4af2-9211-e9eed3dcb026

📥 Commits

Reviewing files that changed from the base of the PR and between 221bc38 and 5fc00df.

📒 Files selected for processing (5)

src/gateway/api/routes/_pipeline.py
src/gateway/api/routes/_platform.py
tests/integration/test_platform_mode_chat.py
tests/unit/test_code_execution_resolve.py
tests/unit/test_web_search_url_scoping.py

Walkthrough

Adds _resolve_platform_code_execution, a new async helper that fetches workspace sandbox policy from the platform's /gateway/code-execution/resolve endpoint. Refactors platform-origin URL targeting logic into a shared url_targets_platform helper for both sandbox and web-search credential checks. prepare_gateway_tools is extended to call this resolver in platform mode, enforce the enabled flag, apply a default purpose_hint, and derive a sandbox_max_iterations cap that bounds the ToolContext tool-loop iterations.

Changes

Platform Code-Execution Policy Resolution and Sandbox Tool Integration

Layer / File(s)	Summary
Platform code-execution resolver implementation `src/gateway/api/routes/_platform.py`, `tests/unit/test_code_execution_resolve.py`	Adds `_resolve_platform_code_execution` async function that POSTs an empty body to `/gateway/code-execution/resolve`, returns JSON on 200, forwards `detail` for 401/402/403/404/429 (copying `Retry-After` on 429), and maps 422/5xx/timeouts/network errors to 502. Includes a cosmetic formatting fix to the existing `_post_platform` call. Unit tests cover the success path (correct URL/headers/body and returned policy dict), 403 pass-through, 429 with `Retry-After`, 503 and 422 mapped to 502, `httpx.NetworkError` mapped to 502, and missing `base_url` mapped to 500.
Shared URL-targeting helper for platform-origin checks `src/gateway/api/routes/_pipeline.py`, `tests/unit/test_web_search_url_scoping.py`	Refactors the platform-origin URL targeting logic into a new reusable `url_targets_platform` helper (replacing the prior web-search-specific `web_search_url_targets_platform`), centralizing origin/path-safe checks for credential forwarding. Web-search URL scoping tests are updated to call the new shared helper.
Sandbox policy enforcement and tool iteration wiring `src/gateway/api/routes/_pipeline.py`, `tests/integration/test_platform_mode_chat.py`	Imports `_resolve_platform_code_execution` and the new URL helper, adds `SANDBOX_NOT_ENABLED_DETAIL` constant, extends `prepare_gateway_tools` to resolve sandbox policy in platform mode (enforcing `enabled`, applying a default `purpose_hint`, extracting `sandbox_max_iterations`), threads `sandbox_max_iterations` into `ToolContext` as the iteration cap, and updates the web-search credential check to use the shared helper. Integration tests verify that requests are rejected with 403 when sandbox is disabled, the workspace `default_purpose_hint` is applied when the request omits `purpose_hint`, and per-request `purpose_hint` overrides the workspace default.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

mozilla-ai/otari#144: Both PRs modify platform-mode web-search behavior in src/gateway/api/routes/_pipeline.py and _platform.py—this PR refactors shared url_targets_platform logic used for web-search credential forwarding, while the retrieved PR adds workspace web-search policy resolution.
mozilla-ai/otari#182: Both PRs modify prepare_gateway_tools in _pipeline.py for platform mode sandbox handling—this PR wires sandbox policy-derived limits and credential checks via ToolContext, while the retrieved PR adds ToolContext.sandbox_auth_token and updates SandboxBackend credential forwarding, directly adjacent code paths.

Suggested reviewers

njbrake
dpoulopoulos
tbille

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title follows the required Conventional Commit format with 'feat:' prefix and imperative mood, but exceeds the ~70 character guideline at 96 characters.	Consider shortening the title to under ~70 characters. For example: 'feat(sandbox): consume platform code-execution resolve' and move detailed notes about purpose hint and max iterations to the PR body.
Docstring Coverage	⚠️ Warning	Docstring coverage is 31.82% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description is comprehensive and covers all required template sections including objectives, PR type, checklist, and AI usage disclosure with appropriate detail.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/consume-code-exec-resolve

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch feat/consume-code-exec-resolve

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tests/unit/test_code_execution_resolve.py (1)

117-129: ⚡ Quick win

Add a timeout-specific test for _resolve_platform_code_execution.

The resolver maps both httpx.NetworkError and httpx.TimeoutException to 502, but only the network branch is currently covered.

Suggested test addition

+@pytest.mark.asyncio
+async def test_resolve_timeout_maps_to_502(monkeypatch: pytest.MonkeyPatch) -> None:
+    async def fake_post(**kwargs: Any) -> httpx.Response:
+        raise httpx.TimeoutException("timed out")
+
+    monkeypatch.setattr(platform_module, "_post_platform", fake_post)
+
+    from fastapi import HTTPException
+
+    with pytest.raises(HTTPException) as ei:
+        await _resolve_platform_code_execution(_config(), "tk")
+    assert ei.value.status_code == 502

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_code_execution_resolve.py` around lines 117 - 129, Add a new
test function similar to test_resolve_network_error_maps_to_502 to cover the
timeout case that is currently missing. The new test should mock the
_post_platform method to raise httpx.TimeoutException instead of
httpx.NetworkError, then verify that calling _resolve_platform_code_execution
also maps this timeout exception to a 502 status code via HTTPException,
ensuring both error handling paths are tested.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/gateway/api/routes/_pipeline.py`:
- Around line 746-750: The iteration-cap fallback logic uses the `or` operator
with max_tool_iterations (line 747) and sandbox_max_iterations (line 750), which
incorrectly treats a value of 0 as falsy and falls back to the default values.
Replace the `or` logic with explicit `is None` checks to properly distinguish
between an unset value (None) and a legitimate zero value. This ensures that
when max_tool_iterations or sandbox_max_iterations is explicitly set to 0, that
value is used rather than being replaced by DEFAULT_MAX_TOOL_ITERATIONS or
MAX_TOOL_ITERATIONS_CAP.
- Around line 667-674: The code uses untyped values from the policy dict
returned by _resolve_platform_code_execution without validating their types,
which could allow malformed platform payloads to silently bypass security gates.
Specifically, the "enabled" field used in the policy.get("enabled") check and
the "max_iterations" field used in the policy.get("max_iterations") check should
be validated to ensure "enabled" is a boolean and "max_iterations" is a positive
integer before they are used. Add type validation for both fields immediately
after retrieving them from the policy dict, and raise an adapter error with a
502 status code if either field has an incorrect type, since this indicates a
cross-service contract violation.

---

Nitpick comments:
In `@tests/unit/test_code_execution_resolve.py`:
- Around line 117-129: Add a new test function similar to
test_resolve_network_error_maps_to_502 to cover the timeout case that is
currently missing. The new test should mock the _post_platform method to raise
httpx.TimeoutException instead of httpx.NetworkError, then verify that calling
_resolve_platform_code_execution also maps this timeout exception to a 502
status code via HTTPException, ensuring both error handling paths are tested.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f11b149b-e470-424e-a56c-a1afdbcc086e

📥 Commits

Reviewing files that changed from the base of the PR and between 33e94ff and bb76d9f.

📒 Files selected for processing (3)

src/gateway/api/routes/_pipeline.py
src/gateway/api/routes/_platform.py
tests/unit/test_code_execution_resolve.py

dpoulopoulos

Adds the gateway-side consumer for the platform's per-workspace code-execution resolve channel. In platform mode, prepare_gateway_tools now POSTs to /gateway/code-execution/resolve, gates the request with a 403 when the workspace has code execution disabled, applies the workspace default_purpose_hint (a per-request purpose_hint still wins), and folds the workspace max_iterations into the tool-loop iteration cap. A new _resolve_platform_code_execution helper mirrors _resolve_platform_web_search, with unit tests covering the resolver's success and error paths.

- D1 (Dimitris, major): add route-level integration tests mirroring the web-search trio — sandbox 403-when-disabled, workspace default_purpose_hint applied, and per-request purpose_hint wins (tests/integration/test_platform_mode_chat.py). - CR1 (CodeRabbit, major): fail closed on a malformed code-exec policy — a non-bool 'enabled' raises 502 (not a silent 'disabled'); exclude bool from the max_iterations int check so JSON true isn't read as a 1-iteration cap. - D4 (Dimitris, nit): rename web_search_url_targets_platform -> url_targets_platform (it now also gates the sandbox token) + its callers and unit test. CR2 (or vs is-None) and CR3 (redundant timeout test) intentionally left as-is — both are safe/redundant per Dimitris's replies. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

When a request opts into otari_code_execution in platform mode, resolve the workspace's code-exec policy from the platform (POST /gateway/code-execution/ resolve, mirroring web-search): 403 if the workspace has it disabled, otherwise apply the workspace default_purpose_hint to the tool surface (per-request value wins) and bound the tool loop by the workspace max_iterations. This is the consumer half of otari-ai's resolve channel — it makes default_purpose_hint and max_iterations actually reach the model. The /v1/sandbox proxy still re-enforces enabled + tools + the exec timeout. Adds _resolve_platform_code_execution + unit tests mirroring the web-search resolve. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- D1 (Dimitris, major): add route-level integration tests mirroring the web-search trio — sandbox 403-when-disabled, workspace default_purpose_hint applied, and per-request purpose_hint wins (tests/integration/test_platform_mode_chat.py). - CR1 (CodeRabbit, major): fail closed on a malformed code-exec policy — a non-bool 'enabled' raises 502 (not a silent 'disabled'); exclude bool from the max_iterations int check so JSON true isn't read as a 1-iteration cap. - D4 (Dimitris, nit): rename web_search_url_targets_platform -> url_targets_platform (it now also gates the sandbox token) + its callers and unit test. CR2 (or vs is-None) and CR3 (redundant timeout test) intentionally left as-is — both are safe/redundant per Dimitris's replies. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

agpituk temporarily deployed to integration-tests June 17, 2026 13:44 — with GitHub Actions Inactive

coderabbitai Bot requested review from dpoulopoulos, njbrake and tbille June 17, 2026 13:45

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread src/gateway/api/routes/_pipeline.py

Comment thread src/gateway/api/routes/_pipeline.py

dpoulopoulos reviewed Jun 17, 2026

View reviewed changes

Comment thread src/gateway/api/routes/_pipeline.py

Comment thread src/gateway/api/routes/_pipeline.py

Comment thread src/gateway/api/routes/_pipeline.py

Comment thread src/gateway/api/routes/_pipeline.py Outdated

Comment thread tests/unit/test_code_execution_resolve.py

njbrake removed their request for review June 17, 2026 14:47

agpituk temporarily deployed to integration-tests June 17, 2026 20:02 — with GitHub Actions Inactive

coderabbitai Bot requested review from dpoulopoulos and njbrake June 17, 2026 20:02

agpituk removed request for njbrake and tbille June 17, 2026 20:47

agpituk and others added 2 commits June 17, 2026 22:47

agpituk force-pushed the feat/consume-code-exec-resolve branch from 221bc38 to 5fc00df Compare June 17, 2026 20:50

agpituk temporarily deployed to integration-tests June 17, 2026 20:50 — with GitHub Actions Inactive

dpoulopoulos approved these changes Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sandbox): consume the platform code-execution resolve (apply purpose hint + max iterations)#184

feat(sandbox): consume the platform code-execution resolve (apply purpose hint + max iterations)#184
agpituk wants to merge 2 commits into
mainfrom
feat/consume-code-exec-resolve

agpituk commented Jun 17, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

Review limit reached

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

dpoulopoulos left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

agpituk commented Jun 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

PR Type

Relevant issues

Checklist

AI Usage

Summary

What Changed

Benefits

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dpoulopoulos left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

agpituk commented Jun 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading