Skip to content

[RFC]: support web research and browser-grounded automation #112

Description

@hiqiancheng

RFC boundary update for PR #444

The implementation attached to this RFC expanded from a narrow CDP/browser-control proposal into a broader first-party web research capability while keeping the model-facing tool surface intentionally small.

Problem statement

TouchAI needs reliable material collection for research, industry reports, competitive analysis, daily-life lookup, technical investigation, and pages that block simple HTTP fetching. A browser-only design is too expensive and fragile for ordinary discovery, while fetch-only access fails on rendered, interactive, blocked, or login-dependent pages.

Proposed solution

Expose three compact first-party capabilities instead of many low-level tools:

  • web_search: search/discovery across enabled and configured providers.
  • web_fetch: read and extract known public URLs.
  • browser: rendered-page/browser control for interaction, screenshots, blocked pages, existing sessions, and verification.

Settings are split into dedicated tabs:

  • 搜索: pluggable provider configuration, default supplier selection, quota/API-key hints, dynamic exposure based on enabled/configured state, and model-facing routing suggestions.
  • 浏览器控制: feature enablement, default browser resolution, custom browser executable, default homepage, browser data directory, permission mode, per-action permissions, allow/block domains, existing-session policy, default/headless mode, and advanced fingerprint simulation.

The runtime and prompt boundary should guide the model to use search for discovery, fetch for known URLs, and browser control when rendering, interaction, verification, screenshots, login/session state, or anti-fetch behavior requires it. Major research tasks should first form a plan, prefer official and authoritative sources, use visual evidence when it directly explains the result, and provide reviewable references.

Affected boundaries

  • AgentService or conversation runtime
  • tool execution or instruction loading
  • session persistence or context construction
  • database schema or migrations
  • settings UI
  • native browser runtime
  • MCP integration

Design details

  • Search providers are intentionally pluggable so providers such as AnySearch, SearXNG, Semantic Scholar, Brave, Tavily, Exa, Firecrawl, Wikipedia, GitHub, and OpenAlex can be added or removed without reshaping the Settings UI or the model-facing tool contract.
  • API-key providers must not be exposed as usable until configured; no-key/default providers preserve useful behavior without extra setup.
  • Browser control uses installed-browser discovery and default-browser resolution so the UI can show the browser that will actually be used while still allowing a custom executable path.
  • Browser data defaults live under the application data area so profiles are reusable and predictable instead of being hidden in temporary runtime folders.
  • Existing-session connection is policy-controlled: allow, reject, or ask/select when multiple sessions are available.
  • Browser permissions support an overall mode plus granular fields, so users can choose always allow, automatic per-action rules, or reject.
  • Tool calls carry semantic descriptions for review/history display; routine status/current-tab style operations can use fixed concise text, while meaningful browser actions must explain the intent.
  • Fingerprint simulation/headless settings are best-effort compatibility controls, not a promise to bypass anti-bot systems or site policy.

Alternatives and trade-offs

  • Many raw provider/browser tools: more flexible, but noisy for the model and harder for users to audit.
  • Browser-only automation: useful for interactive pages, but too expensive and fragile as the default path for ordinary research discovery.
  • Paid third-party search only: strong quality for configured users, but conflicts with the goal of useful default behavior without extra setup.
  • Fetching search-result pages directly: avoids provider integration, but is brittle, anti-bot-prone, and often produces weaker source attribution.
  • Full anti-detect browser stack: stronger stealth potential, but high maintenance and compliance cost; this RFC keeps to transparent, user-controlled compatibility settings.

Upstream references

The implementation direction was informed by mainstream agent/search/browser patterns including OpenCode/OpenClaw-style search/fetch separation, Browser-use/Patchright/Camoufox-style browser control considerations, and provider-style search services such as Brave Search, Tavily, Exa, Firecrawl, SearXNG, OpenAlex, Semantic Scholar, GitHub, Wikipedia, and AnySearch.

Acceptance criteria

  • The model sees a small, understandable set of web research tools rather than many low-level provider/browser operations.
  • Search settings are managed in a dedicated Settings tab with provider enablement, default supplier, quota labels, API-key handling, and dynamic model exposure.
  • Browser settings are managed in a dedicated Settings tab with browser discovery/defaults, custom executable path, profile data path, default homepage, permission controls, existing-session policy, launch mode, and fingerprint simulation.
  • Search/fetch/browser prompt guidance covers authoritative sourcing, deep research planning, visual evidence, access restrictions, and escalation to browser control when fetch is insufficient.
  • Browser tool calls include concise semantic descriptions suitable for approval UI and history display.
  • Native browser commands and browser tool behavior have automated coverage.
  • PR feat(browser): add web search and browser-grounded automation #444 has green required CI checks, including frontend, Rust, E2E smoke, CodeQL, site build, and PR-template validation.

Testing and rollout

Implementation PR: #444.

Current verification for PR #444:

  • GitHub Checks are green: CI Required, Conventional Commits, Frontend Quality, Frontend Tests, Rust Checks, Desktop E2E Smoke (Windows), E2E Required, CodeQL (javascript-typescript), CodeQL (rust), Site Build, and Validate PR template passed.
  • Local desktop checks passed for lint, format, typecheck, frontend coverage, Rust formatting, and Rust browser_commands integration tests.
  • A local check:rust wrapper previously hit a GitHub RTK download timeout, but the equivalent Rust validation completed in CI.

Rollout should keep the capability behind explicit settings/permissions, preserve no-extra-setup search defaults, and treat browser anti-bot handling as best-effort rather than guaranteed bypass.

Metadata

Metadata

Assignees

Labels

area:agent-serviceAgentService and conversation runtime changesarea:frontendFrontend UI or view-layer changesarea:mcpMCP integration changesenhancementNew feature or requestkind:rfcArchitecture or cross-cutting design discussion

Fields

No fields configured for Feature.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions