diff --git a/vision.md b/vision.md index a73c687dc..780169eec 100644 --- a/vision.md +++ b/vision.md @@ -1,60 +1,390 @@ -# Introduction +# Vision -> This document is human written, meant to communicate how the project works and what its goals are +> The product vision for Executor — what it is, the model it's built on, everything it +> should eventually do, and the discipline for getting there. Implementations churn; this is +> the destination they're heading toward, applied as incremental changes to the existing +> codebase, not a rewrite. -The vision of Executor is to be an open source layer for your integrations. +## What Executor is -It is not AI specific +Executor is an **open-source layer for your integrations** — a catalog of every operation +across a company's software, plus the auth, scoping, policy, history, and oversight to act on +it safely. -It is not code mode specific +- It is **not AI-specific.** Agents are a primary consumer, but the layer is a general way to + represent and interop between sources of capability. +- It is **not code-mode-specific.** Code mode is one way to call tools; it isn't the point. +- It is a category of **source** and a way to **interop between them** through one set of + concepts. -It is a category of source and a way to interop between them. +The bet: as work shifts toward agents acting through software, every company needs one layer +where anything a human could do is callable — auditable, across every account — and where new +capability is *composed* on a shared substrate, not rebuilt per integration. One primitive +unlocks the whole slice: catalog + auth + tools + scoping + UI + workflows + storage. -Executor exposes 4 core concepts: +It runs the same two ways from one codebase: an **in-process SDK** (no server, single-player, +instant) and a **hosted service** (multiplayer, cloud). Single-player must feel great; +multiplayer must be powerful; neither carries the other's weight. -1. Tool +## Core concepts -This is represented via an id, optionally an input schema, and optionally an output schema. +- **Tool** — an id, an optional input schema, and an optional output schema (JSON Schema today; + may grow richer types). The unit of capability. Addressed as + `...`. +- **Source** — contains tools. A source is produced by a plugin from some config (an OpenAPI + spec, a GraphQL endpoint, an MCP server, a CLI's install/run). The core never sees the raw + config — only a normalized manifest. +- **Connection** — a credential, born wired, identified by `(scope, source, name)`. The `name` + is required and load-bearing (the account: `work`, `personal`, `prod`). One source holds many + connections. +- **Secret** — never lives in Executor and never reaches the agent. A connection holds a + `SecretRef` (a pointer — `op://`, `keychain://`, `env://`, `vault://`); a provider resolves it + at call time, in trusted space, behind a proxy. +- **Scope** — placement and identity. An ordered, merged set (see *Scopes*). Every record is + placed in a scope. +- **Policy** — gates execution (allow / require-approval / block). Attached to sources and + connections (see *Policies*). +- **Plugin** — registers sources, tools, providers, storage, surfaces. The one open extension + seam (see *The model*). +- **Manager / Invoker** — the core runtime roles: a manager owns a source's tools and config; an + invoker executes a tool through the right connection and proxy. -Input and Output schemas are JSON schemas, this may evolve in the future to support more complex data types +## The model — core, first-party, one seam, surfaces -Input and Output schemas are +Executor is a **tiny core**, a set of **first-party capabilities** built on it, **one open +extension seam**, and **surfaces** that expose it. -3. Source +**The core (the floor):** `execute(path, args)` (the one verb); the **catalog** of tools; +**connections**; **scopes**; **policies/guardrails**; and `scope()` — narrow the executor to a +subset, the seam everything composes through. -A source contains tools +**First-party capabilities** are a fixed, known set, included or omitted as modules — *not* a +generic plugin registry: toolkits, the MCP host, generative UI, workflows, storage, skills, +audit/runs, the internal-apps catalog. They're separable but they don't go through an abstract +plugin contract, because nobody outside you needs to add a new *kind* of them. -4. Secret +**One open plugin seam: sources.** There are thousands of APIs; you'll never enumerate them; you +want others to add them. That is where an open contract (`resolveTools` / `invokeTool`) earns its +complexity. Secret **providers** are the one plausible second seam, for the same reason. -Tools and secrets +> The discipline: do not make "everything a plugin." Keep first-party capabilities first-party, +> and keep exactly one open seam (sources, + maybe providers). Reversible — extract a new seam +> only when several first-party capabilities demand the same one. -5. Manager +**Surfaces / hosts** serve the executor over a wire: MCP, HTTP, CLI, stdio, triggers, the web +app. The SDK works fully in-process with none of them. -6. Plugin +## Composition — artifacts × surfaces, and the axes -A plugin can register tools, sources, adapters +Everything you build is a point in a small space of orthogonal axes. New things become +coordinates, not bespoke subsystems — this is what keeps the breadth tractable. -7. Invokers, Managers, +- **kind** — tool / view (UI) / workflow / skill / store / connection +- **lifetime** — ephemeral (a turn/session) / durable (deployed, addressable, re-openable) +- **origin** — imported (a spec) / authored (code) / invoked-inline (produced by a call) +- **surface** — in-process / MCP / HTTP / CLI / trigger / web +- **delivery** — inline value / handle / embedded resource / deep link *(negotiated by client + capability)* -## Features to ship +Two rules fall out and dissolve most "how does X compose" confusion: -- Dynamic plugin support - Run plugins in a v8 isolate, let your agent write whatever it needs, ship instructions as part of executor -- Integrations registry -- Configure executor via executor -- MCP apps with dynamic UI - Use react flight / RSCs + code mode, enables -- Scope merging -- Workflows - Ideally built ontop of "use workflow" -- SDK -- Internal apps catalog -- Store custom UI snippets -- MCP channels support like how Claude Code over Discord works to enable talking back to the agent mid tool call -- Storage - Every chat gets a temporary KV, SQLite, Filesystem the agent can use to interact with. The agent can also create these on scopes to persist data between tool calls -- Scope merging - I should be able to add tools at a global, workspace, account level, override the secrets on them per one, override policies, create temporary scopes etc -- Configure executor via executor +1. **Artifacts vs surfaces.** What you *build* (kind) is separate from how it's *reached* + (surface). Build once; a surface projects it. A workflow defined once is reachable via cron, + MCP, or in-process. A view defined once renders standalone or returns over MCP. Not everything + fits every surface (a workflow isn't a slow tool; a UI isn't an HTTP handler). +2. **Rich outputs negotiate delivery.** Too big or too rich for the channel → return a + *reference* the client resolves per its capability. A large result → a **storage handle**. A + UI → an **embedded UI resource** if the client renders it (MCP apps), else a **deep link** + into the web app. UI is not special; it's a rich output under the same rules as everything. -The focus of Executor should be to ship the primitives that build an extendable product. Everything needs to be written with that in mind +## Scopes (scope merging) + +Placement is an **ordered, merged set of scopes**, outer (authority) → inner (actor). You add +tools, connections, and policies at a global, workspace, or account level; override secrets and +policies per scope; and create temporary scopes. Single-player is one scope; multiplayer is the +same model with more entries. + +Three fixed merge rules, nothing configurable: +- **visibility = union** (you see everything placed in any scope you hold); +- **guardrails/policy = deny-union, outer-wins** (an outer scope's `block` can't be weakened + inward; mandatory flows in); +- **scalars/config/secret-overrides = innermost-wins** (the nearest scope to the actor wins). + +`org | user` is just the **two-element case** of this. The current codebase collapsed scopes to +fixed org/user; the direction is to model it as an *ordered list that today has two entries*, so +the resolver is already the general one and inserting a "team" or "environment" level later is +additive, not a rewrite — added demand-driven, only when a real case needs the middle level. + +**Why scopes are a dimension, not indirection.** A past mistake (the credential-binding model: +credential / slot / binding as separate objects wired together) caused an overcorrection that +flattened scopes along with killing bindings. The lesson: **fuse indirection, keep dimensions.** +A binding is a layer whose only job is to *connect* two things — fuse it away (born-wired +connections did this). A scope is a property that varies independently and meaningfully — keep +it; flatten it and it leaks back as workarounds. Placement stays a **direct, inline property** on +each record (the way `scope` already sits on a connection); never reintroduce an object whose job +is to *attach* a record to a scope. + +## Connections, secrets & auth + +- **Connection = a credential, born wired** (secret + account fused; no slot/binding split), + identified by `(scope, source, name)` with `name` required. +- **Multiple accounts per source per scope** (distinct names). Per-member auth (each connects + their own) and team-level shared auth (one credential placed in an outer scope) are both just + placement. +- **Auth template vs credential.** A source declares *how* it authenticates (OAuth / bearer / + API key — a discriminated set); each connection fills that template. +- **Secrets never reach the agent.** Resolved by a provider at call time, in trusted space, + behind a **tool-proxy** — credentials never enter agent or sandbox code. **No escape hatch:** a + `SecretRef` never appears in a tool's I/O schema or any MCP/host response. +- **Heterogeneous backends per member** (1Password / keychain / env / vault), declared + separately. **Credential lifecycle** (refresh, rotation, revocation, expiry) handled below the + agent. + +## Sources & source kinds + +Each kind is added through the **one open seam** (`resolveTools` + `invokeTool`): **OpenAPI**, +**GraphQL**, **MCP servers**, **CLI** (install + run, no spec), **direct API / raw-fetch**, and +later **direct database** and **email**. The plugin owns the opaque config and raw spec; the core +reasons over a **normalized manifest** (schemas, side-effect class, auth template, data labels, +required capabilities, an LLM-authored description). **Spec auto-refresh** tracks upstream drift +and **must never auto-expand authority** — new operations appear *ungranted until reviewed*. + +An **integrations registry** lets people discover and add sources. + +## Policies + +Policies gate **execution** — `approve` / `require_approval` / `block` — and are distinct from +toolkits (which gate visibility). Resolution is *most-restrictive-wins*; **plugin defaults** let +a source ship a tool pre-marked `require_approval` (destructive detection), with `approve` as the +override when an import wrongly flags a safe action. + +How people actually use them today (real usage): glob patterns over the address, where *every* +rule targets a **source** and wildcards the scope/connection (`stripe_api.*.*.disputes.*`, +`pscale.*.*.execute_write_query`) — i.e. faking structural targeting because there's nowhere to +attach it. The direction: + +- **Kill the global glob list. Attach policy to the source and connection records** — the two + structural levels. **Connection overrides source** (innermost-wins), with **scope authority + layered on top** (outer/org mandatory, can't be weakened inward). +- Fixing a mis-flagged tool becomes `approve` *on that source/connection*, directly. +- **Toolkits carry no policy** — they're pure curation and inherit whatever source/connection + policies apply to the tools they expose. This dissolves the old global-vs-toolkit merge + conflict. + +## Toolkits & scoping + +A **toolkit is pure curation** — a named subset of tools (a `ToolSet`) that yields a **scoped +view** of the executor. Used for **per-agent capability control**: lock down exactly what an +added agent can reach (a background firewall-monitor agent gets *only* a monitor tool and an +update-firewall tool, and nothing that builds). **`toolkits.list` feeds a consent screen** — the +grantable slices an agent or OAuth flow requests. Single-player keeps toolkits too. + +## Authorization & capabilities + +One model over a **typed capability namespace**, three layers kept apart: **grants** (positive +authority; toolkits are grant bundles), **policy/guardrails** (deny/approval), and **runtime** +(object-capability handles only — no ambient executor, no raw secret, no unrestricted fetch). + +The tool axis stays a **predicate** (`ToolSet`); **meta-capability kinds are typed per-kind** — +`author-tools` (effects ceiling), `generate-UI`, `allocate-storage` (quota), `deploy`, +`trigger-register`, `egress`. **Capabilities compose transitively, enforced by the +`scope()`-narrowed executor as a membrane:** an authored tool runs against its author's captured +scope, so a toolkit-X agent's tool calling toolkit-Y fails because Y was never in scope. `scope()` +strictly intersects (never widens); delegation routes through a granter; one enforcement surface, +no advisory mode — the local isolate and the cloud worker enforce the same membrane. + +## Surfaces + +You build once; surfaces project. + +- **In-process** — `execute`, the SDK path. No server. +- **MCP** — a scoped endpoint (`/mcp?toolkit=...`) over a *scoped executor*; the host + authenticates → resolves the toolkit → `scope()` → hands the narrowed executor to the MCP + server, which never sees a toolkit. Default shape is **meta-tools** (`search` + `describe` + + `execute` + `run_code`) so a huge catalog doesn't blow context, with a direct-tools opt-out. + **Authoring tools** (`render`, `create_workflow`, `author_tool`, `skills.create`) appear only + when the scope grants the matching meta-capability. +- **MCP apps with dynamic UI** — React-flight / RSC + code mode for rich UI returned over MCP + (an embedded UI resource on capable clients, a deep link otherwise). +- **MCP channels** — talk back to the agent mid-tool-call (the "Claude Code over Discord" + pattern), for human-in-the-loop and elicitation. +- **HTTP** — any capability projected as a route, after policy/auth/versioning/CORS. +- **CLI** — the same operations from the terminal (see *Distribution*). +- **Triggers** — a schedule (cron) or an event that fires an artifact. +- **Web app** — the UI-capable surface where views render and deep links land. + +## What you build + +- **Custom tools & API gateway.** Write a tool as a function and deploy it; it joins the catalog. + **Account-parametric** — a custom function calls a catalog op without baking in an account; the + account is passed at invocation. Executor doubles as an **API gateway** with a **generated typed + SDK** for everything added. +- **Workflows.** Multi-step sequences of tool calls with **durable run semantics**, built ideally + on `use workflow` (or Cloudflare Workflows). User-defined entry points; **cron + event + triggers**; infrastructure-as-code. A step *is* a catalog tool call. +- **Generative UI.** A **view** governed by lifetime × delivery: ephemeral (the agent calls + `render`) or durable (an authored dashboard). Rich UI runs in a sandbox; it calls only the + tools its scope grants, proxied server-side, never holding credentials. +- **Skills.** Shared, code-authored knowledge — the company knowledge base as code — that agents + draw on, that **surfaces in tool search**, and can **run tools inline** to cut round-trips. + Served off MCP and from a `.well-known`. File-backed (see *Authoring model*). +- **Internal apps catalog & custom UI snippets.** Store and share custom UI and internal apps. +- **Configure Executor via Executor** — managing sources, connections, scopes, policies, + toolkits is itself done through tools. + +## Authoring model — two paths + +Every authored artifact (skill, custom tool, workflow, UI) has **two first-class authoring +paths**, matched to two contexts: + +- **File + deploy** (developer, with the CLI): `executor/skills/`, `executor/tools/`, … authored + in a repo, `executor deploy` pushes them. Git gives versioning, review, rollback. The + canonical, org-shared tier. +- **A tool over MCP** (an agent with no CLI, e.g. in a desktop client): `skills.create`, + `author_tool`, `create_workflow` write to the runtime directly. Zero friction, no filesystem. + +These aren't two sources of truth: each artifact has **one master** distinguished by origin and +**scope/tier** (a runtime-authored artifact is personal/inner-scope; a deployed one is +org/outer-scope, git-backed). **Promotion** is an explicit handoff — `executor pull` materializes +a good runtime artifact into files for review and merge, graduating it to canonical. With a +git-backed store (below), runtime authoring is literally a commit-from-a-worker and promotion is +a branch/namespace merge. + +## Storage — two substrates + +Split by lifetime/kind, not one confused store: + +- **Authored artifacts** (skills, custom tools, workflows, UI — durable, versioned, reviewed) → + **git-backed**: local Git, and in the cloud **Cloudflare Artifacts** (git-over-HTTPS, push + commits from a Worker, read at runtime via a Workers binding / ArtifactFS, namespaces ≈ scopes, + versioning/pinning/rollback from git for free). "Git stands in for Cloudflare Artifacts" is + literal — local == remote with no translation. Both authoring paths commit to one repo. +- **Agent state** (data — reactive, high-frequency) → **KV / SQLite / filesystem**. Every chat + gets a temporary KV + SQLite + filesystem the agent can use; the agent can also create stores + on scopes to persist between calls. Accessed **as ordinary tools** (no side-channel API). + Addressed by **handles** + a search-across-stores utility. **Reactive** (Convex-style: a + workflow writes, a UI/chat sees it live, with access rules). **Large results land in storage** + and return a handle — which is how the MCP context-overflow problem is actually solved. + +## The Run model + +Every execution, on any surface, produces or attaches to a **Run**: id, parent run, artifact +version, principal chain, capability set, policy version, input hash, output handle, logs, spans, +approvals, retries, cancellation, status. One record **collapses four features into views over +it**: audit log, human-in-the-loop approvals, workflow runs, and resumability/debugging. + +## Human-in-the-loop + +Pause a call and require a human before it proceeds: **MCP elicitation**, a **gated `resume` +tool** (the user simply doesn't always allow it), a **resume URL**, or an **MCP channel** +talk-back mid-call. A gated call returns a *pending* Run with a resume reference. + +## Sandbox & runtime + +Untrusted code — user plugins, agent-authored tools, generative-UI bundles — runs sandboxed: +locally a **V8 isolate** (and the quickjs / dynamic-worker / deno-subprocess runtimes), in the +cloud a **Cloudflare Dynamic Worker**. **Default-deny.** Both enforce the **same capability +membrane** (the scope-narrowed executor; `env` is the capability set; outbound closed except +through the tool-execution binding). Credentials never enter the sandbox. Let the agent write +whatever it needs; ship instructions (skills) as part of Executor. + +## Distribution, the CLI & remotes + +The `executor` CLI is **three things at once**: the **local runtime** (run a full instance and +start using it), the **control plane** (local *and* remote, same commands), and an +**agent-facing surface**. Distribution surfaces are all "local": `executor` (CLI), `executor +service install` (daemon), `executor web` (web app) — one instance, delivered differently. + +- **Remotes ("cores").** Add remote instances; one shared sign-in; call them identically. + Execution **runs where the credential lives** — invoke from machine B, it runs on A with A's + credential; nothing crosses machines. +- **Merge local + remote** (the "SSH for tools" / OpenTunnel model) — a tool on one machine is + callable from anywhere. +- **Deploy & dev.** `deploy` (with `--check`) pushes authored artifacts; git-backed `sync`; + hot-reload `dev`; local filesystem access. **Configure Executor via Executor** from the CLI. + +## Cloud & user-generated extensions + +The hosted product runs on a **Cloudflare substrate** and lets users **write plugins and load +them dynamically, sandboxed** — each in a Worker isolate with a **declared capability manifest** +(a plugin asking for specific scopes can do exactly that and nothing else). User plugins ride the +**source seam** and the capability membrane, opened **last and curated**. Local == remote via +**substrate substitution**: a local V8 isolate stands in for the dynamic worker, Git for +Cloudflare Artifacts, local SQLite for hosted storage. + +## Cross-cutting concerns + +- **Data provenance / tainting** — label data handles (source, connection, scope, sensitivity, + retention, allowed readers/egress) so policy can reason as data flows tool → storage → UI → + chat. +- **Egress / allowed-hosts** — prevents **data** exfiltration (credentials are already safe via + the proxy), not credential leakage. A guardrail facet. +- **Simulation / emulation** — run tools against the `@executor-js/emulate` emulators + (wire-level, stateful, real OAuth + credentials + request ledger) for side-effect-free testing. +- **Multiple language targets** — JS now, Python later (data/analysis work). + +## Architecture + +Built on **Effect**. The same SDK is in-memory *and* client/server; SDK-only users need no +server; serving is a standard Web `Request → Response` handler. The discipline that keeps breadth +cheap is the **seam discipline**: the platform owns generic concerns (auth, caching, retries, +schema, rendering, transport, the Run); a plugin owns only the source-specific resolve/invoke. + +## Principles + +- **One unifying primitive** — everything reduces to "invoke tool T → typed output." +- **Tiny core; compose the rest.** Ship the primitives that build an extendable product. +- **No silent defaults; explicit addressing.** The account is always in the path. +- **Type the human surface, not the agent surface.** +- **Secrets never reach the agent; authority only narrows.** +- **Visibility ≠ permission** — curation (toolkits) is separate from governance (policy). +- **Share the substrate aggressively; split the surfaces deliberately.** The failure mode of a + broad product is too many features in one undifferentiated human surface — keep that surface + minimal. +- **Fuse indirection; keep dimensions.** (Born-wired connections, not credential bindings; + stacked scopes, not flattened ones.) +- **One master per record** — file-backed or runtime-authored, never two live masters. + +## How we build it + +- **Don't rewrite — evolve the existing codebase.** This vision is a map you steer toward with + incremental, git-shaped changes, not a greenfield rebuild. The wedge already exists. +- **The wedge:** be best-in-class at the catalog of imported tools + connections/secrets, exposed + through one invocation interface — with zero-token-exposure, the Run/audit record, human + approval, the capability membrane, and code-mode/meta-tool search baked in. Then sequence + nearest-to-substrate first: MCP → storage → workflows → generative UI → user plugins (curated, + last). Three substrate invariants keep composition cheap: one typed invocation primitive; + authorization inherited from the substrate, never re-implemented; a hard seam between + substrate-generic and source-specific code. +- **Vision mode vs build mode.** This document is vision mode — generativity ("and then you'd + want X") is the point. In **build mode, every "and then you'd want X" is a YAGNI to defer**, not + a dependency to satisfy. Build the smallest thing useful on its own; let the next be *pulled* by + real pain, not *pushed* by imagined composition. The tell that you've slipped back to vision + mode: the next step needs two other things built first. (Example: v1 skills = read a folder, + expose `skills.search` + `skills.get`. No scopes, toolkits, environments, or git required.) + +## North-star scenarios + +- **The shared-data loop.** "Every morning at 9am, load my GitHub issues into SQLite": a + scheduled workflow writes to storage; a generative-UI front end reads it live; you chat with the + agent over the same data. One substrate, three readers. +- **The scoped agent.** "An agent I can message that can *only* update my calendar." A toolkit + exposes a one-tool slice; the agent connects to a scoped MCP endpoint; it physically cannot + reach anything else. + +## Positioning + +The open-source alternative to closed integration layers; interops with adjacent projects; one +primitive serving both backend-for-customers and internal-team use. The model is to nail a wedge, +then grow many composing capabilities on a shared substrate — without letting the human-facing +surface bloat with it. + +## Open questions + +- Data provenance/tainting — the label model and how policy consumes it. +- Skills ↔ custom functions ↔ specs — where the boundaries sit and how the capability gate + follows references. +- Typegen ownership — leaning "don't own it; expose as a utility." +- Reconciliation UX for merged local+remote catalogs; filesystem overlay vs CLI-routed. +- Storage allocation default — which class is the default and how handles are named. +- Raw-fetch manifest — how an untyped tool kind declares its capabilities.