A terminal AI coding agent that validates every change through a TypeScript language server before writing it to disk.
npm install -g anvil-agent
anvilai "Rename the User type to Account across all files" ./my-projectMost coding agents write files and hope for the best. Anvil runs every proposed edit through typescript-language-server in a shadow copy of your project first. If the edit introduces type errors, the agent reads the diagnostics, self-corrects, and retries — your real files are never touched until the change is clean.
The context layer is agentic rather than one-shot. Instead of loading the whole codebase into the prompt, Anvil uses AST queries (tree-sitter), LSP symbol lookup, and an embedding-backed semantic_search to find exactly what it needs — the definition site of a type, the files that import it, or the region of code that best matches a fuzzy question like "where is the retry logic". A cross-file rename typically takes 6–8 targeted reads, not a full directory dump. The semantic index is built at session start (Voyage-3 when VOYAGE_API_KEY is set, TF-IDF cosine similarity otherwise — no external service required).
For multi-file tasks, Anvil runs a Planner subagent first. You see the full plan — which files change, in what order, and why — before any writes happen. Approve, reject, or revise before a single line changes.
After execution, the ValidationEngine runs a four-phase sweep — type check, lint, tests, and any custom VALIDATE: commands you've declared in .anvil/rules.md. Failures come back as a structured fix plan that the executor uses for up to two auto-fix rounds. Each session is also committed on a new git branch per-file, so every change is reversible with a single command.
You can watch the full session progress through a 10-phase workflow indicator in the TUI: Initializing → Exploring → Planning → Awaiting approval → Branching → Executing → Verifying → Fixing → Committing → Complete.
Three paths — pick one:
npm (recommended)
npm install -g anvil-agentRequires Node 18+. TypeScript language server is bundled.
Compiled binary (no Node required)
# macOS arm64
curl -L https://github.com/arpjw/anvil/releases/latest/download/anvilai-darwin-arm64 -o anvilai && chmod +x anvilai && sudo mv anvilai /usr/local/bin/
# macOS x64 / Linux x64 / Windows: swap the filename aboveBuilt with bun build --compile. Zero runtime dependencies — the binary boots faster than the Node CLI.
From source
git clone https://github.com/arpjw/anvil.git && cd anvil
npm install
npx tsx src/index.ts "<request>" <path/to/workdir>Verify the install:
anvilai --version
anvilai doctorSet your API key. Anvil supports Claude, GPT, Gemini, and Moonshot. On first run, an interactive picker lets you select a model — it will tell you which environment variable to set.
export ANTHROPIC_API_KEY=... # Claude Sonnet 4.6 (default), Opus 4.8 / 4.7 / 4.6, Haiku 4.5
export OPENAI_API_KEY=... # GPT-4o, GPT-4o mini, o3, o4-mini
export GEMINI_API_KEY=... # Gemini 2.5 Pro, Gemini 2.5 Flash
export MOONSHOT_API_KEY=... # Moonshot v1 (8K / 32K / 128K context)Initialize a project
cd your-project
anvilai init # interactive setup: languages, ignore dirs, test command, style rules
anvilai doctor # verify configuration and tool availabilityRun a task
anvilai "<request>" [path/to/workdir]anvilai "Add JSDoc to all exported functions in src/auth.ts" ./my-project
anvilai "Rename the User type to Account across all files" ./my-project
anvilai "Extract the validation logic in submitOrder into a pure function" ./my-projectFor simple single-file tasks, Anvil skips the planner and executes directly. For complex multi-file requests, it runs the Planner first and shows the full plan before prompting y / n / revise.
Slash commands
After anvilai init, three starter commands are available in .anvil/commands/. These are plain .md files — edit them or add your own.
anvilai /review . # scan codebase for bugs and type issues
anvilai /document src/auth.ts # add JSDoc to exported functions
anvilai /test . # write unit tests for uncovered functions
anvilai --commands # list all available slash commandsFlags
--model <id> Select model directly, skip interactive picker
--dry-run Plan only — print the plan, do not execute
--no-verify Skip the post-execution verification pass
--headless No TUI — outputs JSON result to stdout (for CI)
--image <filepath> Attach an image as context (PNG/JPG/WebP/GIF)
--resume <sessionId> Resume a previously interrupted session
--rollback <sessionId> Revert all file changes from a session
--commands List available slash commands
--version Print the installed anvil-agent version
Config
anvilai config list # show all settings
anvilai config set model claude-opus-4-8 # switch model
anvilai config set autoBranch false # disable per-session git branching
anvilai config set autoVerify false # disable verification pass
anvilai config get model # read a single value1. Classify. The Orchestrator decides whether the request is simple (single-file, single concept) or complex (multi-file, cross-cutting). Simple tasks skip the planner and execute immediately.
2. Plan. For complex tasks, the Planner uses read-only tools — ast_search, find_symbol, semantic_search, text_search, read_file — to map the codebase and produce a structured plan: which files to touch, in what order, and what each change accomplishes. semantic_search is backed by a session-local embedding index kicked off at plan start (Voyage-3 if VOYAGE_API_KEY is set, TF-IDF fallback otherwise).
3. Approve. The plan is displayed in the TUI. Type y to proceed, n to cancel, or r to revise with a follow-up instruction.
4. Branch. If autoBranch is enabled (default), Anvil creates a anvil/<sessionId> git branch before any writes. Each file is committed individually as it's completed.
5. Execute. The Executor works through the plan. Every write_file call goes through the shadow workspace:
propose edit
→ copy file to /tmp/anvil/<session>/shadow/
→ send textDocument/didChange to typescript-language-server
→ wait for publishDiagnostics
→ clean? commit to real file : send diagnostics back to agent, retry
Each shadow cycle is logged to /tmp/anvil/<session>/shadow.log as newline-delimited JSON.
6. Verify. The ValidationEngine runs a four-phase sweep against the working tree: type check (tsc --noEmit / mypy / cargo check), lint (ESLint or pylint), test suite (auto-detected — jest, vitest, pytest, cargo, go), and any custom shell commands declared in .anvil/rules.md as VALIDATE: <cmd>. Failures are grouped by phase and turned into a fix plan that the Executor consumes for up to two auto-fix rounds. Clean verification is required for the session to land in memory and generate a PR description.
7. Memory. A summary of what changed is appended to .anvil/memory.md so future sessions have context on what was done and why.
Rollback. If a session goes wrong, --rollback <sessionId> uses git to restore every file the session touched.
Phase visibility. Every checkpoint above emits a phase_transition UIEvent, and the TUI shows a live Phase N/10: Label indicator. In headless / cloud runs, these events forward to whatever's consuming the stream — the CLI, the JSON output, or the hosted dashboard.
┌─────────────────────────────────────────────┐
│ TUI │
│ Ink · event stream · 10-phase workflow bar │
├─────────────────────────────────────────────┤
│ Orchestrator │
│ complexity classifier · workflow phases │
├──────────────────────┬──────────────────────┤
│ Planner │ Executor │
│ read-only tools │ shadow-mediated │
├──────────────────────┴──────────────────────┤
│ Shadow Workspace │
│ propose → LSP validate → commit │
├──────────────────────┬──────────────────────┤
│ EmbeddingService │ ValidationEngine │
│ session index │ typecheck · lint │
│ voyage-3 / TF-IDF │ · tests · custom │
├──────────────────────┴──────────────────────┤
│ Context Engine │
│ read_file · ast_search · find_symbol │
│ · semantic_search · text_search · git_* │
└─────────────────────────────────────────────┘
| Component | Source | Role |
|---|---|---|
| Orchestrator | src/agents/orchestrator.ts |
Classifies requests, coordinates subagents, drives the 10-phase workflow |
| Planner | src/agents/planner.ts |
Read-only exploration, produces plan.json; kicks off embedding indexing |
| Executor | src/agents/executor.ts |
Applies plan, all writes shadow-mediated |
| Shadow Workspace | src/shadow/workspace.ts |
LSP validation gate before disk commit |
| Context Engine | src/tools/, src/lsp/, src/treesitter/ |
AST queries, symbol lookup, semantic + text search, git tools |
| EmbeddingService | src/services/embedding/ |
Session-scoped semantic index (Voyage-3 or TF-IDF fallback) powering semantic_search |
| ValidationEngine | src/services/validation/ |
Sequential typecheck → lint → tests → custom rules; produces structured fix plans |
| Verifier | src/execution/verifier.ts |
Runs the ValidationEngine and coordinates auto-fix rounds |
| WorkflowPhase | src/agents/workflow.ts |
10-phase state machine + phase_transition events |
| TUI | src/ui/ |
Ink/React interface, plan approval gate, diff review, phase indicator |
git clone https://github.com/arpjw/anvil.git
cd anvil
npm install
export ANTHROPIC_API_KEY=your_key_here
npx tsx src/index.ts "<request>" <path/to/workdir>Optional env vars are documented in .env.example — most notably VOYAGE_API_KEY to enable Voyage-3 embeddings for semantic_search (TF-IDF fallback otherwise).
The same agent runs in a hosted control plane under cloud/:
- Control plane (
cloud/control-plane/) — Bun + Hono API, Postgres via Drizzle, Redis pub/sub for SSE, Clerk auth, Stripe metered billing, Octokit-based GitHub integration. - VM runtime (
cloud/vm/) — Docker / Firecracker driver, in-VM agent that forwards everyUIEvent(includingphase_transition) back to the control plane over a websocket. - Dashboard (
cloud/dashboard/) — Next.js 16 + Clerk 6 + Tailwind v4. Live SSE session view, plan approval, per-file diff review, one-click "Open PR", billing / usage page, and a waitlist landing.
See cloud/README.md for local dev + deploy setup and cloud/docs/api.md for the full API reference.
A deep dive into the shadow workspace implementation, why agentic retrieval outperforms one-shot RAG on cross-file tasks, and how Cursor's architecture maps to what Anvil does at the filesystem level: [coming soon].