Skip to content

feat(sandbox): consume the platform code-execution resolve (apply purpose hint + max iterations)#184

Open
agpituk wants to merge 2 commits into
mainfrom
feat/consume-code-exec-resolve
Open

feat(sandbox): consume the platform code-execution resolve (apply purpose hint + max iterations)#184
agpituk wants to merge 2 commits into
mainfrom
feat/consume-code-exec-resolve

Conversation

@agpituk

@agpituk agpituk commented Jun 17, 2026

Copy link
Copy Markdown
Member

Description

Consumer half of the per-workspace code-execution resolve channel (the otari-ai side is mozilla-ai/otari-ai#991).

When a request opts into otari_code_execution in platform mode, prepare_gateway_tools now resolves the workspace's code-exec policy from the platform (POST /gateway/code-execution/resolve, mirroring the web-search resolve):

  • 403 if the workspace has code execution disabled (gateway-side gate, before calling the model);
  • applies the workspace default_purpose_hint to the otari_code_execution tool surface (a per-request purpose_hint still wins);
  • bounds the tool loop by the workspace max_iterations (folded into the existing iteration cap).

This is what makes default_purpose_hint/max_iterations actually reach the model — previously they were configurable but never delivered. The platform clamps the soft limits to operator ceilings at resolve time, and the /v1/sandbox proxy still re-enforces enabled + the tools allow-list + the exec timeout.

Adds _resolve_platform_code_execution (mirrors _resolve_platform_web_search: same base-url guard, timeout, headers, status-code handling).

PR Type

  • New Feature
  • Bug Fix
  • Refactor
  • Documentation
  • Infrastructure / CI

Relevant issues

Pairs with mozilla-ai/otari-ai#991 (the resolve endpoint + per-workspace config).

Checklist

  • I understand the code I am submitting.
  • I have added or updated tests that cover my change (tests/unit/test_code_execution_resolve.py).
  • I ran the Definition of Done checks locally (make lint, make typecheck, make test). — make lint + make typecheck pass; full tests/unit passes (441); container-backed integration deferred to CI.
  • Documentation was updated where necessary. — none required (internal platform-resolve call; no user-facing behavior beyond the policy now applying).
  • If the API contract changed, I regenerated the OpenAPI spec. — N/A, the gateway's own API contract is unchanged (this consumes a platform endpoint; it adds none).

AI Usage

  • No AI was used.
  • AI was used for drafting/refactoring.
  • This is fully AI-generated.

AI Model/Tool used:
Claude (Opus 4.8) via Claude Code.

Any additional AI details you'd like to share:
Implemented by mirroring the existing web-search resolve consumer, with unit tests mirroring test_web_search_resolve. Verified make lint/make typecheck and the full unit suite locally. Reviewer questions will be answered by a human.

  • I am an AI Agent filling out this form (check box if true)

Summary

This PR extends the gateway’s platform-mode support so workspaces can centrally configure whether (and how) code execution is allowed. Before the gateway proceeds with sandbox/tool execution, it now fetches the workspace’s code-execution policy from the platform and enforces it consistently.

What Changed

Gateway pipeline (src/gateway/api/routes/_pipeline.py):

  • Added a platform policy fetch step for otari_code_execution during tool preparation in platform mode
  • Rejects sandbox/code execution with 403 when the workspace policy says code execution is disabled
  • Applies a workspace-level default purpose_hint, while allowing a per-request purpose_hint to override it
  • Caps sandbox/tool loop iterations using the workspace-configured maximum
  • Improves “fail closed” validation: malformed policy values now cause the request to fail instead of being treated as disabled
  • Introduces shared URL scoping helper url_targets_platform (renaming the previous web-search-specific helper) and uses it for both sandbox and web-search platform credential forwarding checks
  • Adds new client-visible error detail constants for the sandbox-not-enabled and malformed-policy cases

Platform resolver (src/gateway/api/routes/_platform.py):

  • Added _resolve_platform_code_execution() to call the platform endpoint (/gateway/code-execution/resolve) and map responses/timeouts to gateway errors consistently with existing platform resolvers

Tests:

  • Added unit tests for the platform resolver’s success path and error/status handling (tests/unit/test_code_execution_resolve.py)
  • Added integration tests for platform-mode sandbox behavior, including:
    • 403 when disabled
    • applying default_purpose_hint
    • per-request purpose_hint precedence (tests/integration/test_platform_mode_chat.py)
  • Updated related web-search URL scoping tests to use url_targets_platform

Benefits

  • Centralized, per-workspace control over code execution in platform mode
  • Stronger safety: malformed platform policy no longer degrades into silently-allowed behavior
  • Clear defaults with predictable override behavior for purpose hints

@agpituk agpituk temporarily deployed to integration-tests June 17, 2026 13:44 — with GitHub Actions Inactive
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@agpituk, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 11 minutes and 55 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 974a027c-859c-4af2-9211-e9eed3dcb026

📥 Commits

Reviewing files that changed from the base of the PR and between 221bc38 and 5fc00df.

📒 Files selected for processing (5)
  • src/gateway/api/routes/_pipeline.py
  • src/gateway/api/routes/_platform.py
  • tests/integration/test_platform_mode_chat.py
  • tests/unit/test_code_execution_resolve.py
  • tests/unit/test_web_search_url_scoping.py

Walkthrough

Adds _resolve_platform_code_execution, a new async helper that fetches workspace sandbox policy from the platform's /gateway/code-execution/resolve endpoint. Refactors platform-origin URL targeting logic into a shared url_targets_platform helper for both sandbox and web-search credential checks. prepare_gateway_tools is extended to call this resolver in platform mode, enforce the enabled flag, apply a default purpose_hint, and derive a sandbox_max_iterations cap that bounds the ToolContext tool-loop iterations.

Changes

Platform Code-Execution Policy Resolution and Sandbox Tool Integration

Layer / File(s) Summary
Platform code-execution resolver implementation
src/gateway/api/routes/_platform.py, tests/unit/test_code_execution_resolve.py
Adds _resolve_platform_code_execution async function that POSTs an empty body to /gateway/code-execution/resolve, returns JSON on 200, forwards detail for 401/402/403/404/429 (copying Retry-After on 429), and maps 422/5xx/timeouts/network errors to 502. Includes a cosmetic formatting fix to the existing _post_platform call. Unit tests cover the success path (correct URL/headers/body and returned policy dict), 403 pass-through, 429 with Retry-After, 503 and 422 mapped to 502, httpx.NetworkError mapped to 502, and missing base_url mapped to 500.
Shared URL-targeting helper for platform-origin checks
src/gateway/api/routes/_pipeline.py, tests/unit/test_web_search_url_scoping.py
Refactors the platform-origin URL targeting logic into a new reusable url_targets_platform helper (replacing the prior web-search-specific web_search_url_targets_platform), centralizing origin/path-safe checks for credential forwarding. Web-search URL scoping tests are updated to call the new shared helper.
Sandbox policy enforcement and tool iteration wiring
src/gateway/api/routes/_pipeline.py, tests/integration/test_platform_mode_chat.py
Imports _resolve_platform_code_execution and the new URL helper, adds SANDBOX_NOT_ENABLED_DETAIL constant, extends prepare_gateway_tools to resolve sandbox policy in platform mode (enforcing enabled, applying a default purpose_hint, extracting sandbox_max_iterations), threads sandbox_max_iterations into ToolContext as the iteration cap, and updates the web-search credential check to use the shared helper. Integration tests verify that requests are rejected with 403 when sandbox is disabled, the workspace default_purpose_hint is applied when the request omits purpose_hint, and per-request purpose_hint overrides the workspace default.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • mozilla-ai/otari#144: Both PRs modify platform-mode web-search behavior in src/gateway/api/routes/_pipeline.py and _platform.py—this PR refactors shared url_targets_platform logic used for web-search credential forwarding, while the retrieved PR adds workspace web-search policy resolution.
  • mozilla-ai/otari#182: Both PRs modify prepare_gateway_tools in _pipeline.py for platform mode sandbox handling—this PR wires sandbox policy-derived limits and credential checks via ToolContext, while the retrieved PR adds ToolContext.sandbox_auth_token and updates SandboxBackend credential forwarding, directly adjacent code paths.

Suggested reviewers

  • njbrake
  • dpoulopoulos
  • tbille
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title follows the required Conventional Commit format with 'feat:' prefix and imperative mood, but exceeds the ~70 character guideline at 96 characters. Consider shortening the title to under ~70 characters. For example: 'feat(sandbox): consume platform code-execution resolve' and move detailed notes about purpose hint and max iterations to the PR body.
Docstring Coverage ⚠️ Warning Docstring coverage is 31.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The description is comprehensive and covers all required template sections including objectives, PR type, checklist, and AI usage disclosure with appropriate detail.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/consume-code-exec-resolve
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/consume-code-exec-resolve

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/unit/test_code_execution_resolve.py (1)

117-129: ⚡ Quick win

Add a timeout-specific test for _resolve_platform_code_execution.

The resolver maps both httpx.NetworkError and httpx.TimeoutException to 502, but only the network branch is currently covered.

Suggested test addition
+@pytest.mark.asyncio
+async def test_resolve_timeout_maps_to_502(monkeypatch: pytest.MonkeyPatch) -> None:
+    async def fake_post(**kwargs: Any) -> httpx.Response:
+        raise httpx.TimeoutException("timed out")
+
+    monkeypatch.setattr(platform_module, "_post_platform", fake_post)
+
+    from fastapi import HTTPException
+
+    with pytest.raises(HTTPException) as ei:
+        await _resolve_platform_code_execution(_config(), "tk")
+    assert ei.value.status_code == 502
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_code_execution_resolve.py` around lines 117 - 129, Add a new
test function similar to test_resolve_network_error_maps_to_502 to cover the
timeout case that is currently missing. The new test should mock the
_post_platform method to raise httpx.TimeoutException instead of
httpx.NetworkError, then verify that calling _resolve_platform_code_execution
also maps this timeout exception to a 502 status code via HTTPException,
ensuring both error handling paths are tested.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/gateway/api/routes/_pipeline.py`:
- Around line 746-750: The iteration-cap fallback logic uses the `or` operator
with max_tool_iterations (line 747) and sandbox_max_iterations (line 750), which
incorrectly treats a value of 0 as falsy and falls back to the default values.
Replace the `or` logic with explicit `is None` checks to properly distinguish
between an unset value (None) and a legitimate zero value. This ensures that
when max_tool_iterations or sandbox_max_iterations is explicitly set to 0, that
value is used rather than being replaced by DEFAULT_MAX_TOOL_ITERATIONS or
MAX_TOOL_ITERATIONS_CAP.
- Around line 667-674: The code uses untyped values from the policy dict
returned by _resolve_platform_code_execution without validating their types,
which could allow malformed platform payloads to silently bypass security gates.
Specifically, the "enabled" field used in the policy.get("enabled") check and
the "max_iterations" field used in the policy.get("max_iterations") check should
be validated to ensure "enabled" is a boolean and "max_iterations" is a positive
integer before they are used. Add type validation for both fields immediately
after retrieving them from the policy dict, and raise an adapter error with a
502 status code if either field has an incorrect type, since this indicates a
cross-service contract violation.

---

Nitpick comments:
In `@tests/unit/test_code_execution_resolve.py`:
- Around line 117-129: Add a new test function similar to
test_resolve_network_error_maps_to_502 to cover the timeout case that is
currently missing. The new test should mock the _post_platform method to raise
httpx.TimeoutException instead of httpx.NetworkError, then verify that calling
_resolve_platform_code_execution also maps this timeout exception to a 502
status code via HTTPException, ensuring both error handling paths are tested.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f11b149b-e470-424e-a56c-a1afdbcc086e

📥 Commits

Reviewing files that changed from the base of the PR and between 33e94ff and bb76d9f.

📒 Files selected for processing (3)
  • src/gateway/api/routes/_pipeline.py
  • src/gateway/api/routes/_platform.py
  • tests/unit/test_code_execution_resolve.py

Comment thread src/gateway/api/routes/_pipeline.py
Comment thread src/gateway/api/routes/_pipeline.py

@dpoulopoulos dpoulopoulos left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds the gateway-side consumer for the platform's per-workspace code-execution resolve channel. In platform mode, prepare_gateway_tools now POSTs to /gateway/code-execution/resolve, gates the request with a 403 when the workspace has code execution disabled, applies the workspace default_purpose_hint (a per-request purpose_hint still wins), and folds the workspace max_iterations into the tool-loop iteration cap. A new _resolve_platform_code_execution helper mirrors _resolve_platform_web_search, with unit tests covering the resolver's success and error paths.

Comment thread src/gateway/api/routes/_pipeline.py
Comment thread src/gateway/api/routes/_pipeline.py
Comment thread src/gateway/api/routes/_pipeline.py
Comment thread src/gateway/api/routes/_pipeline.py Outdated
Comment thread tests/unit/test_code_execution_resolve.py
@njbrake njbrake removed their request for review June 17, 2026 14:47
agpituk added a commit that referenced this pull request Jun 17, 2026
- D1 (Dimitris, major): add route-level integration tests mirroring the
  web-search trio — sandbox 403-when-disabled, workspace default_purpose_hint
  applied, and per-request purpose_hint wins (tests/integration/test_platform_mode_chat.py).
- CR1 (CodeRabbit, major): fail closed on a malformed code-exec policy — a
  non-bool 'enabled' raises 502 (not a silent 'disabled'); exclude bool from the
  max_iterations int check so JSON true isn't read as a 1-iteration cap.
- D4 (Dimitris, nit): rename web_search_url_targets_platform -> url_targets_platform
  (it now also gates the sandbox token) + its callers and unit test.

CR2 (or vs is-None) and CR3 (redundant timeout test) intentionally left as-is —
both are safe/redundant per Dimitris's replies.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@agpituk agpituk temporarily deployed to integration-tests June 17, 2026 20:02 — with GitHub Actions Inactive
@coderabbitai coderabbitai Bot requested review from dpoulopoulos and njbrake June 17, 2026 20:02
@agpituk agpituk removed request for njbrake and tbille June 17, 2026 20:47
agpituk and others added 2 commits June 17, 2026 22:47
When a request opts into otari_code_execution in platform mode, resolve the
workspace's code-exec policy from the platform (POST /gateway/code-execution/
resolve, mirroring web-search): 403 if the workspace has it disabled, otherwise
apply the workspace default_purpose_hint to the tool surface (per-request value
wins) and bound the tool loop by the workspace max_iterations. This is the
consumer half of otari-ai's resolve channel — it makes default_purpose_hint and
max_iterations actually reach the model. The /v1/sandbox proxy still re-enforces
enabled + tools + the exec timeout.

Adds _resolve_platform_code_execution + unit tests mirroring the web-search resolve.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- D1 (Dimitris, major): add route-level integration tests mirroring the
  web-search trio — sandbox 403-when-disabled, workspace default_purpose_hint
  applied, and per-request purpose_hint wins (tests/integration/test_platform_mode_chat.py).
- CR1 (CodeRabbit, major): fail closed on a malformed code-exec policy — a
  non-bool 'enabled' raises 502 (not a silent 'disabled'); exclude bool from the
  max_iterations int check so JSON true isn't read as a 1-iteration cap.
- D4 (Dimitris, nit): rename web_search_url_targets_platform -> url_targets_platform
  (it now also gates the sandbox token) + its callers and unit test.

CR2 (or vs is-None) and CR3 (redundant timeout test) intentionally left as-is —
both are safe/redundant per Dimitris's replies.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@agpituk agpituk force-pushed the feat/consume-code-exec-resolve branch from 221bc38 to 5fc00df Compare June 17, 2026 20:50
@agpituk agpituk temporarily deployed to integration-tests June 17, 2026 20:50 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants