Skip to content

docs: agent shell consolidation — one shell, AgentProfile-as-contract#64

Merged
drewstone merged 1 commit into
mainfrom
docs/agent-shell-consolidation
Jun 15, 2026
Merged

docs: agent shell consolidation — one shell, AgentProfile-as-contract#64
drewstone merged 1 commit into
mainfrom
docs/agent-shell-consolidation

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

Design spec (review-only, no code) for making all 5 agent products the same shell. Drew chose plan-first; this is that plan.

The finding

agent-app already owns most of the shell (chat tool-loop, model-config resolution, capability auth, billing, hub, SSO, side-channel tools, defineAgentApp config seam). Five core-loop concerns are still hand-rolled in every product (~3,000 lines copied 5×, already drifting):

  1. Skill registry + ~/.claude/skills mount
  2. AgentProfile assembly (sandbox path)
  3. Sandbox provisioning (ensureWorkspaceSandbox)
  4. Per-turn model resolution
  5. System-prompt assembly

The key architectural decision (Drew's call)

The shell's input contract is the sandbox SDK's AgentProfile type — not a new invented seam. Verified AgentProfile already carries every field needed (prompt/model/permissions/tools/mcp/subagents/resources.files/hooks/extensions). So skills+knowledge → resources.files, specialists → subagents, model hints → model. The shell becomes shell(profile: AgentProfile, runtimeConfig). A product collapses to one defineAgentProfile({...}) + a thin ShellRuntimeConfig (~10 lines + config).

Evidence the doc surfaced (drift is real, not theoretical)

  • tax: the nightly eval grades packages/api-worker's profile, which diverged 58 lines from the deployed apps/web copy — evals score a profile users never get.
  • creative: sandbox/index.ts is 1474 L and the app-tool layer 2345 L (12.8× gtm) purely because it's behind on the lift, not structurally heavier.
  • model-resolution.ts is byte-identical-duplicated within tax; the dual-path corpus loader appears 12+ times fleet-wide.

What's in it

Grounded in a 6-repo surface map (agent-app + 5 products). Sections: problem + measured duplication → already-provided vs the gap → target architecture (AgentProfile contract + the ~10-line product shape) → per-concern lift plan in dependency order (skill-mount first, tax monorepo last) → per-product migration notes + outliers (gtm specialists/Intelligence, creative design-canvas, tax monorepo, insurance market-pack corpus) → additive flag-gated rollout → risks/out-of-scope → decisions to confirm.

Review the doc, mark up §8 (decisions to confirm) and the lift order, and I'll turn it into the substrate-release + per-product migration work.

…le-as-contract

Design spec to lift the 5 still-hand-rolled agent-shell concerns (skill registry +
~/.claude/skills mount, AgentProfile assembly, sandbox provisioning, per-turn model
resolution, system-prompt assembly) into agent-app, so each product (gtm/creative/
tax/legal/insurance) collapses to one defineAgentProfile({...}) + a thin
ShellRuntimeConfig. The shell's input contract IS the sandbox SDK AgentProfile type
(not a new invented seam). Grounded in a 6-repo surface map; documents measured
duplication (~3000 lines x5) and confirmed drift (tax evals grade a 58-line-stale
profile; creative app-tool layer 12.8x gtm). Staged additive rollout, flag-gated.

@tangletools tangletools left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — 25d39ef0

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-15T10:37:38Z

@tangletools tangletools left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Value Audit — better-approach-exists

Verdict better-approach-exists
Concerns 4 (1 medium-concern, 3 weak-concern)
Heuristic 0.0s
Duplication 0.0s
Interrogation 180.4s (2 bridge agents)
Total 180.4s

💰 Value — better-approach-exists

Adds a well-grounded design spec to lift 5 duplicated agent-shell concerns into agent-app using the sandbox SDK's AgentProfile as the contract; the plan is coherent but does not reconcile with the existing AgentAppConfig product contract, so a human should decide the single source of truth before im

  • What it does: This PR adds docs/agent-shell-consolidation.md (265 lines), a staff architecture proposal. It identifies ~3,000 lines of duplicated plumbing across gtm/creative/tax/legal/insurance (skill registry + ~/.claude/skills mounts, AgentProfile assembly, sandbox provisioning, per-turn model resolution, system-prompt assembly) and proposes lifting each concern into @tangle-network/agent-app as additi
  • Goals it achieves: 1) Eliminate measured duplication and drift (e.g., tax eval scoring a 58-line stale profile, creative's 2,345-line hand-rolled app-tool layer, byte-identical model-resolution.ts copies). 2) Make the shell's input contract the existing sandbox SDK AgentProfile type rather than inventing a new seam. 3) Enable incremental, additive, non-breaking rollout so products migrate one concern at a time b
  • Assessment: The analysis is coherent and grounded in real repo evidence: it correctly maps the existing lifted pieces (/runtime tool loop/model catalog, /tools capability auth, /delegation MCP, /config data contract, etc.) and accurately identifies the gaps (no skill mount, no AgentProfile composer, no sandbox provisioning, no per-turn model picker, no prompt assembler). It respects the engine/shell l
  • Better / existing approach: The codebase already has a product config contract: AgentAppConfig defined in src/config/index.ts:164-200 and exported via defineAgentApp (src/config/index.ts:198-200), with tests in tests/config.test.ts and a scaffolder template in create-agent-app/template/agent.config.ts. The proposed design makes products author a separate defineAgentProfile({...}) + ShellRuntimeConfig instead

🎯 Usefulness — sound-with-nits

A coherent, well-grounded design proposal that identifies real fleet-wide duplication and fits agent-app's additive-subpath/structural style; only minor concerns about reconciling it with the existing AgentAppConfig surface and declaring the optional sandbox peer dependency.

  • Integration: This PR is docs-only: it adds only docs/agent-shell-consolidation.md and no code. The proposed subpaths (@tangle-network/agent-app/skills, /profile, /sandbox, /model-resolution, /prompt) do not exist in package.json exports (package.json:34-184), tsup.config.ts entries (tsup.config.ts:4-35), or src/index.ts re-exports (src/index.ts:9-36). Searches in src/ for the proposed key symbols (skillRegistr
  • Fit with existing patterns: The design aligns with how agent-app is built. Existing modules already avoid direct @tangle-network/sandbox imports by staying structural (src/delegation/index.ts:11-14; src/tools/mcp.ts:15-17 reference AgentProfileMcpServer structurally), and the codebase already exports additive subpaths for runtime, tools, delegation, etc. The gaps named in the doc are real: there is no skill-mount loader, no
  • Real-world viability: The proposal explicitly addresses realistic failure modes: fail-closed model allowlisting and catalog validation, severed-stream detection lifted from creative, workspace- vs user-bound capability tokens, and a hard ordering constraint to repoint tax evals before deleting the dead api-worker package. It also proposes additive subpaths so non-sandbox consumers are not forced to adopt the sandbox pe

🎯 Usefulness Audit

🟡 Design does not reconcile with the existing AgentAppConfig surface [problem-fit] ``

agent-app already ships AgentAppConfig / defineAgentApp (src/config/index.ts:164-200) as the canonical product declaration, used by create-agent-app/template/agent.config.ts:15-17 and tests/config.test.ts. The doc’s example product collapses to defineAgentProfile + ShellRuntimeConfig without showing how it relates to AgentAppConfig. Confirm before implementation whether AgentAppConfig becomes the source of truth that a composer turns into AgentProfile, or whether the greenfield template should s

🟡 Sandbox subpath needs an optional peer-dependency declaration [integration] ``

The doc plans an agent-app/sandbox subpath that imports @tangle-network/sandbox, but package.json:216-249 does not list @tangle-network/sandbox in peerDependencies or peerDependenciesMeta. To keep agent-app usable without the sandbox SDK for edge/browser paths (as src/runtime/agent.ts:24-26 stays substrate-free), add it as an optional peer dependency when that subpath lands.

🟡 Vite import.meta.glob loader needs a testable Node fallback [robustness] ``

The proposed loadMarkdownCorpus must preserve the literal import.meta.glob string for Vite static analysis (docs/agent-shell-consolidation.md:143). agent-app currently has no build-time glob pattern; ensure the Node fs fallback is exercised in vitest so the same source runs in tests and non-Vite consumers without a runtime fork per environment.

💰 Value Audit

🟠 Proposed AgentProfile contract is parallel to existing AgentAppConfig product surface [better-architecture] ``

The repo already ships AgentAppConfig/defineAgentApp (src/config/index.ts:164-200) as the declarative product contract, validated by tests/config.test.ts and the create-agent-app scaffolder (create-agent-app/template/agent.config.ts). The design doc proposes defineAgentProfile({...}) + ShellRuntimeConfig as the new product surface without reconciling these two contracts. This risks two config languages, divergent scaffolder guidance, and confused future agents. A better approach


What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass What it asks
Heuristic Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication Do added function/class names already exist elsewhere in the repo?
Value Audit What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

value-audit · 20260615T104235Z

@tangletools

Copy link
Copy Markdown

✅ No Blockers — 25d39ef0

Readiness 89/100 · Confidence 65/100 · 3 findings (3 low)

deepseek glm aggregate
Readiness 89 92 89
Confidence 65 65 65
Correctness 89 92 89
Security 89 92 89
Testing 89 92 89
Architecture 89 92 89

Full multi-shot audit completed 1/1 planned shots over 1 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 1/1 planned shots over 1 changed files. Global verifier still owns final merge decision.

🟡 LOW Hashed dist filenames will rot on next build — docs/agent-shell-consolidation.md

Multiple references cite tsup-generated hashed dist filenames (e.g. 'dist/sandbox-Dyf07Ckv.d.ts:190' at line 58, 'dist/model-CKzniMMr.d.ts:108' at line 38). These hashes change on every build and will be stale within one release. For a proposal doc this is harmless, but if the doc is expected to persist as architectural reference, consider citing source file paths or stable export names instead.

🟡 LOW No trailing newline at EOF — docs/agent-shell-consolidation.md

The file ends with \ No newline at end of file (git diff line 265). POSIX convention and most editors/linters expect a final newline. No prettier/markdownlint config exists in this repo to enforce it, so non-blocking. Fix: append a single newline. One-character change.

🟡 LOW Terminology conflict: 'shell default' vs 'opt-in' for severed-stream classifier — docs/agent-shell-consolidation.md

§4.3 (line 169) says 'lift creative's severed-stream + model-call-failure classifiers ... as a shell default' (implying opt-out / always-on), but §7 (lines 244-245) says 'ship it opt-in via ShellRuntimeConfig.streamFailureClassifier defaulting to creative's implementation.' The words 'opt-in' and 'default-on' conflict — if the field defaults to creative's implementation, it's opt-out, not opt-in. The mechanism described (a configurable ShellRuntimeConfig field) is clear enough, but stakeholders reading this doc will interpret


tangletools · 2026-06-15T10:42:38Z · trace

@drewstone drewstone merged commit af36a67 into main Jun 15, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants