feat(sandbox): consume the platform code-execution resolve (apply purpose hint + max iterations)#184
feat(sandbox): consume the platform code-execution resolve (apply purpose hint + max iterations)#184agpituk wants to merge 2 commits into
Conversation
|
Warning Review limit reached
More reviews will be available in 11 minutes and 55 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
WalkthroughAdds ChangesPlatform Code-Execution Policy Resolution and Sandbox Tool Integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/unit/test_code_execution_resolve.py (1)
117-129: ⚡ Quick winAdd a timeout-specific test for
_resolve_platform_code_execution.The resolver maps both
httpx.NetworkErrorandhttpx.TimeoutExceptionto 502, but only the network branch is currently covered.Suggested test addition
+@pytest.mark.asyncio +async def test_resolve_timeout_maps_to_502(monkeypatch: pytest.MonkeyPatch) -> None: + async def fake_post(**kwargs: Any) -> httpx.Response: + raise httpx.TimeoutException("timed out") + + monkeypatch.setattr(platform_module, "_post_platform", fake_post) + + from fastapi import HTTPException + + with pytest.raises(HTTPException) as ei: + await _resolve_platform_code_execution(_config(), "tk") + assert ei.value.status_code == 502🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/test_code_execution_resolve.py` around lines 117 - 129, Add a new test function similar to test_resolve_network_error_maps_to_502 to cover the timeout case that is currently missing. The new test should mock the _post_platform method to raise httpx.TimeoutException instead of httpx.NetworkError, then verify that calling _resolve_platform_code_execution also maps this timeout exception to a 502 status code via HTTPException, ensuring both error handling paths are tested.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/gateway/api/routes/_pipeline.py`:
- Around line 746-750: The iteration-cap fallback logic uses the `or` operator
with max_tool_iterations (line 747) and sandbox_max_iterations (line 750), which
incorrectly treats a value of 0 as falsy and falls back to the default values.
Replace the `or` logic with explicit `is None` checks to properly distinguish
between an unset value (None) and a legitimate zero value. This ensures that
when max_tool_iterations or sandbox_max_iterations is explicitly set to 0, that
value is used rather than being replaced by DEFAULT_MAX_TOOL_ITERATIONS or
MAX_TOOL_ITERATIONS_CAP.
- Around line 667-674: The code uses untyped values from the policy dict
returned by _resolve_platform_code_execution without validating their types,
which could allow malformed platform payloads to silently bypass security gates.
Specifically, the "enabled" field used in the policy.get("enabled") check and
the "max_iterations" field used in the policy.get("max_iterations") check should
be validated to ensure "enabled" is a boolean and "max_iterations" is a positive
integer before they are used. Add type validation for both fields immediately
after retrieving them from the policy dict, and raise an adapter error with a
502 status code if either field has an incorrect type, since this indicates a
cross-service contract violation.
---
Nitpick comments:
In `@tests/unit/test_code_execution_resolve.py`:
- Around line 117-129: Add a new test function similar to
test_resolve_network_error_maps_to_502 to cover the timeout case that is
currently missing. The new test should mock the _post_platform method to raise
httpx.TimeoutException instead of httpx.NetworkError, then verify that calling
_resolve_platform_code_execution also maps this timeout exception to a 502
status code via HTTPException, ensuring both error handling paths are tested.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f11b149b-e470-424e-a56c-a1afdbcc086e
📒 Files selected for processing (3)
src/gateway/api/routes/_pipeline.pysrc/gateway/api/routes/_platform.pytests/unit/test_code_execution_resolve.py
dpoulopoulos
left a comment
There was a problem hiding this comment.
Adds the gateway-side consumer for the platform's per-workspace code-execution resolve channel. In platform mode, prepare_gateway_tools now POSTs to /gateway/code-execution/resolve, gates the request with a 403 when the workspace has code execution disabled, applies the workspace default_purpose_hint (a per-request purpose_hint still wins), and folds the workspace max_iterations into the tool-loop iteration cap. A new _resolve_platform_code_execution helper mirrors _resolve_platform_web_search, with unit tests covering the resolver's success and error paths.
- D1 (Dimitris, major): add route-level integration tests mirroring the web-search trio — sandbox 403-when-disabled, workspace default_purpose_hint applied, and per-request purpose_hint wins (tests/integration/test_platform_mode_chat.py). - CR1 (CodeRabbit, major): fail closed on a malformed code-exec policy — a non-bool 'enabled' raises 502 (not a silent 'disabled'); exclude bool from the max_iterations int check so JSON true isn't read as a 1-iteration cap. - D4 (Dimitris, nit): rename web_search_url_targets_platform -> url_targets_platform (it now also gates the sandbox token) + its callers and unit test. CR2 (or vs is-None) and CR3 (redundant timeout test) intentionally left as-is — both are safe/redundant per Dimitris's replies. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When a request opts into otari_code_execution in platform mode, resolve the workspace's code-exec policy from the platform (POST /gateway/code-execution/ resolve, mirroring web-search): 403 if the workspace has it disabled, otherwise apply the workspace default_purpose_hint to the tool surface (per-request value wins) and bound the tool loop by the workspace max_iterations. This is the consumer half of otari-ai's resolve channel — it makes default_purpose_hint and max_iterations actually reach the model. The /v1/sandbox proxy still re-enforces enabled + tools + the exec timeout. Adds _resolve_platform_code_execution + unit tests mirroring the web-search resolve. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- D1 (Dimitris, major): add route-level integration tests mirroring the web-search trio — sandbox 403-when-disabled, workspace default_purpose_hint applied, and per-request purpose_hint wins (tests/integration/test_platform_mode_chat.py). - CR1 (CodeRabbit, major): fail closed on a malformed code-exec policy — a non-bool 'enabled' raises 502 (not a silent 'disabled'); exclude bool from the max_iterations int check so JSON true isn't read as a 1-iteration cap. - D4 (Dimitris, nit): rename web_search_url_targets_platform -> url_targets_platform (it now also gates the sandbox token) + its callers and unit test. CR2 (or vs is-None) and CR3 (redundant timeout test) intentionally left as-is — both are safe/redundant per Dimitris's replies. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
221bc38 to
5fc00df
Compare
Description
Consumer half of the per-workspace code-execution resolve channel (the otari-ai side is mozilla-ai/otari-ai#991).
When a request opts into
otari_code_executionin platform mode,prepare_gateway_toolsnow resolves the workspace's code-exec policy from the platform (POST /gateway/code-execution/resolve, mirroring the web-search resolve):default_purpose_hintto theotari_code_executiontool surface (a per-requestpurpose_hintstill wins);max_iterations(folded into the existing iteration cap).This is what makes
default_purpose_hint/max_iterationsactually reach the model — previously they were configurable but never delivered. The platform clamps the soft limits to operator ceilings at resolve time, and the/v1/sandboxproxy still re-enforcesenabled+ thetoolsallow-list + the exec timeout.Adds
_resolve_platform_code_execution(mirrors_resolve_platform_web_search: same base-url guard, timeout, headers, status-code handling).PR Type
Relevant issues
Pairs with mozilla-ai/otari-ai#991 (the resolve endpoint + per-workspace config).
Checklist
tests/unit/test_code_execution_resolve.py).make lint,make typecheck,make test). —make lint+make typecheckpass; fulltests/unitpasses (441); container-backed integration deferred to CI.AI Usage
AI Model/Tool used:
Claude (Opus 4.8) via Claude Code.
Any additional AI details you'd like to share:
Implemented by mirroring the existing web-search resolve consumer, with unit tests mirroring
test_web_search_resolve. Verifiedmake lint/make typecheckand the full unit suite locally. Reviewer questions will be answered by a human.Summary
This PR extends the gateway’s platform-mode support so workspaces can centrally configure whether (and how) code execution is allowed. Before the gateway proceeds with sandbox/tool execution, it now fetches the workspace’s code-execution policy from the platform and enforces it consistently.
What Changed
Gateway pipeline (
src/gateway/api/routes/_pipeline.py):otari_code_executionduring tool preparation in platform modepurpose_hint, while allowing a per-requestpurpose_hintto override iturl_targets_platform(renaming the previous web-search-specific helper) and uses it for both sandbox and web-search platform credential forwarding checksPlatform resolver (
src/gateway/api/routes/_platform.py):_resolve_platform_code_execution()to call the platform endpoint (/gateway/code-execution/resolve) and map responses/timeouts to gateway errors consistently with existing platform resolversTests:
tests/unit/test_code_execution_resolve.py)default_purpose_hintpurpose_hintprecedence (tests/integration/test_platform_mode_chat.py)url_targets_platformBenefits