fix: merge main into dev to resolve conflict for release PR #1093#1094
Merged
Conversation
* docs: overhaul MCP reference with Mermaid diagrams and typed tool schemas (#1030)
Rewrite docs/reference/mcp.md as the single authoritative MCP reference:
- Add 4 Mermaid diagrams (architecture, tool dispatch, resource resolution, agent role surfaces)
- Add fully typed input/output schemas with example JSON-RPC for all 12 tools
- Generalize language from Cursor-specific to client-agnostic
Delete the mcp-tools.md stub (17 lines pointing to mcp.md).
Clean up docs/guides/mcp.md:
- Remove duplicate stdio config block (lines 77-97 were copy of 55-75)
- Generalize remaining Cursor-specific language (keep Cursor as example client)
- Fix stale tool name reference (build_spawn_child -> build_spawn_adhoc_child)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* security: harden agent runtime — path sandbox, secret redaction, denylist, prompt injection guardrails, wall-clock timeout
Five concrete hardening measures from the MCP/agent runtime security audit:
1. File-tool path sandbox (_is_safe_read_path / _is_safe_write_path in
agent_loop.py): reads restricted to worktree + repo root; writes
restricted to worktree only. Symlinks resolved before check. All 10
file/directory tools enforced.
2. Shell command secret redaction (_redact_secrets in shell_tools.py):
all stdout/stderr from run_command is stripped of ANTHROPIC_API_KEY,
GITHUB_TOKEN, DATABASE_URL, AC_API_KEY, ghp_ PATs, sk-ant- keys, and
Bearer tokens before the output reaches the agent or the DB.
3. Expanded shell denylist: adds rm -rf /app, rm -rf /worktrees, nc -e,
/dev/tcp/, and /dev/udp/ to the existing _BLOCKED_PATTERNS.
4. Prompt injection security contract (_RUNTIME_ENV_NOTE in agent_loop.py):
every agent system prompt now includes an explicit instruction to treat
all repository content as untrusted external data, never to exfiltrate
credentials, and never to make unauthorized outbound HTTP requests.
5. Wall-clock timeout (asyncio.timeout in agent_loop.py): wraps the entire
agent loop with AGENT_MAX_WALL_SECONDS (default 7200 s / 2 h); timeout
cancels the run gracefully and transitions it to 'cancelled'.
Docker: adds no-new-privileges, cap_drop ALL, and minimal cap_add to
docker-compose.yml.
Docs: security.md rewritten to document all five hardening layers plus
the threat model update.
Tests: 131 pass (mypy clean, zero Any).
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* security: non-root container user, narrow bind mounts, egress allowlist proxy (phase 2)
Three additional hardening measures completing the phase-2 security audit:
1. Non-root process user (Dockerfile + scripts/entrypoint.sh)
- Creates agentception user UID/GID 1001 in Dockerfile
- Installs gosu (purpose-built privilege-drop helper) alongside git/curl/ripgrep
- scripts/entrypoint.sh performs a two-phase startup:
* Root phase: write /etc/resolv.conf, compile SCSS/JS, run Alembic
migrations, chown /worktrees and model cache to agentception:agentception
* Unprivileged phase: exec gosu agentception "$@" → uvicorn runs as PID 1
with UID 1001, zero capabilities to acquire additional privs
- git system config adds user.email/user.name and http.proxy for worktree ops
- Verified: /proc/1/status shows Uid=1001 in the running container
2. Narrow bind mounts (docker-compose.yml)
- Replaces ./:/app with explicit per-directory mounts: agentception/,
.agentception/, pyproject.toml, org-presets.yaml, scripts/, tests/, tools/
- .env, docker-compose*.yml, .git/, Dockerfile, and all other sensitive or
unnecessary files are intentionally excluded
- Verified: /app/.env is absent inside the running container
- docker-compose.ci.yml updated with matching narrow mounts + no-op proxy
service for CI compatibility
3. Egress allowlist proxy (scripts/tinyproxy/)
- scripts/tinyproxy/Dockerfile.proxy: builds a tinyproxy image from
debian:bookworm-slim (no third-party base images)
- scripts/tinyproxy/tinyproxy.conf: FilterDefaultDeny Yes + FilterURLs On;
any domain NOT in the allowlist is TCP-rejected at the CONNECT stage
- scripts/tinyproxy/filter: allowlist covers api.anthropic.com, github.com
family, npm registry, HuggingFace, PyPI, Cloudflare DNS
- agentception service routes all HTTP/HTTPS through http://proxy:8888 via
HTTP_PROXY/HTTPS_PROXY/http_proxy/https_proxy environment variables +
git http.proxy system config
- NO_PROXY exempts internal services: postgres, qdrant, host.docker.internal
- Verified: httpx ProxyError for malicious-exfil-server.io; 200 for
api.github.com; api.github.com Zen quote returned through proxy
docs/guides/security.md fully updated with implementation details, threat model
updates, and documented residual risks.
Tests: 131/131 pass, mypy clean, zero Any, generate.py no drift.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: commit package-lock.json — every CI Docker build was failing
package-lock.json was listed in .gitignore, so it never existed in the
GitHub Actions checkout. The Dockerfile has:
COPY package.json package-lock.json tsconfig.json /app/
Without the lockfile in the build context, every CI run failed at this
step with:
ERROR: "/package-lock.json": not found
npm ci (correct for Docker builds) requires the lockfile to guarantee
reproducible, deterministic package installs. The fix is to remove
package-lock.json from .gitignore and track it. This is standard
practice for applications (as opposed to libraries, which should not
commit lockfiles).
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: uncomment A/B metrics panel div in build.html
The div was accidentally wrapped in an HTML comment, causing
test_ab_panel_polling_div_in_build_html to fail: HTMLParser skips
comment content so parser.found stayed {}, failing the id assertion.
The route (/api/metrics/ab), router registration, and partial template
(_ab_metrics.html) were all correctly wired — only the HTML comment
needed removal.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* ci: fold smoke test into pytest job — eliminate redundant image rebuild
The smoke job ran on a fresh GitHub Actions runner and rebuilt the
Docker image (~2-3 min) just to run two curl calls. The image was
already built in the same pipeline run by the mypy and test jobs.
Fix: remove the separate smoke job and append its health probe steps
to the test job. The image is already built and postgres is already
running on that same runner, so the smoke check adds only ~30s
(container start + entrypoint + healthcheck poll) instead of 4+ min.
Pipeline shape is unchanged: generated-files → typecheck → {test, typing-ceiling}
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add Phase 1A screenshot to README How It Works section
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add Phase 1B screenshot to README How It Works section
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add Ship screenshot to README How It Works section
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add Agent Org screenshot to README How It Works section
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: MCP docs and tests — api ref, README, auth docstring, integrate.md, edge-case tests (#1041)
- API reference: add POST /api/mcp to URL taxonomy and new MCP section
with auth note and link to Security guide and MCP reference
- docs/README: fix directory structure — MCP HTTP route is routes/api/mcp.py,
not mcp/http_server.py; list mcp.py under routes/api/
- routes/api/mcp.py: docstring now states endpoint is protected by
ApiKeyMiddleware when AC_API_KEY is set; point to Security guide
- integrate.md: clarify what MCP provides (invoke tools, read resources,
fetch prompts) in opening paragraph
- test_mcp_http: add test_batch_one_invalid_item_one_valid_returns_mixed_results,
test_resources_read_invalid_uri_via_http, test_tools_call_missing_required_arguments_returns_error
- test_mcp_resources: add test_read_resource_batch_tree_malformed_empty_batch_id_returns_not_found
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: full Cursor decoupling — taxonomy and implementation (#1043)
- A: Remove TaskRunnerChoice.cursor, keep anthropic only; drop CI .cursor volume
- B: Rename cursor_project_id → ide_project_id; roles summary and RoleMeta
- C: _scan_cursor_docs → _scan_agentception_docs; reword config/readers/db/docker
- D: Docs generalized to MCP client (e.g. Cursor); legacy note on cursor-agent-spawning
- E: .cursor/ in .gitignore; remove stale Cursor projects dir from settings UI
See docs/reference/cursor-decoupling-taxonomy.md.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* Merge main into dev — resolve docker-compose.ci.yml conflict (#1045)
* chore: clean up all Maestro, Stori DAW, and Storpheus monorepo bleed (#57)
* Remove all machine-specific paths; fix .env for AgentCeption
- Replace every /Users/gabriel and $HOME/dev/tellurstori/* reference with
portable <repo-root> placeholder or $HOME-relative paths throughout all
role prompts, templates, docs, and scripts
- Strip Maestro/Stori legacy vars from .env; keep only AC_* vars needed by
AgentCeption (DB_PASSWORD, AC_GITHUB_TOKEN, AC_GH_REPO, AC_OPENROUTER_API_KEY)
- Set AC_GH_REPO=cgcardona/agentception (was incorrectly tellurstori/agentception)
- Update agent-tree-protocol.md examples to use <repo-root> placeholder with
explanatory note that AgentCeption fills in the real AC_REPO_DIR path
- Fix docstring example in build.py (tellurstori/maestro → cgcardona/agentception)
- Fix test fixture cursor_project_id to use generic example
* Pass GITHUB_TOKEN into container from AC_GITHUB_TOKEN
macOS stores gh CLI tokens in Keychain, not in the config file, so the
read-only ~/.config/gh volume mount doesn't carry auth into Docker.
Passing AC_GITHUB_TOKEN as GITHUB_TOKEN lets gh CLI authenticate without
a service restart or interactive login inside the container.
* Fix pipeline-config.json repo_dir: /repo → /app
The project entry had repo_dir=/repo but the container mounts the repo at
/app (WORKDIR). This caused settings.repo_dir to be overridden to /repo at
startup, making _CONFIG_PATH resolve to /repo/.agentception/pipeline-config.json
which doesn't exist — so the API fell back to empty defaults and the project
switcher showed no projects.
* Drop AC_ prefix from all environment variables
Now that AgentCeption is its own standalone repo there is no namespace
collision risk with a parent monorepo. Unprefixed names are cleaner for
operators and consistent with standard tooling conventions.
Renamed throughout config, compose files, .env, docs, role prompts, and
templates:
AC_DATABASE_URL → DATABASE_URL
AC_GH_REPO → GH_REPO
AC_REPO_DIR → REPO_DIR
AC_WORKTREES_DIR → WORKTREES_DIR
AC_HOST_WORKTREES_DIR → HOST_WORKTREES_DIR
AC_OPENROUTER_API_KEY → OPENROUTER_API_KEY
AC_GITHUB_TOKEN → GITHUB_TOKEN
AC_PORT → PORT
AC_HOST → HOST
AC_LOG_LEVEL → LOG_LEVEL
AC_POLL_INTERVAL_SECONDS → POLL_INTERVAL_SECONDS
AC_GITHUB_CACHE_SECONDS → GITHUB_CACHE_SECONDS
AC_CURSOR_PROJECTS_DIR → CURSOR_PROJECTS_DIR
Also fixes app.py default port 7777 → 10003 (was stale from monorepo era).
AC_URL in .agent-task files is retained — it is a task-file field, not a
system env var.
* Remove all Maestro, Stori DAW, and Storpheus references from codebase
AgentCeption is now fully standalone with zero bleed from the old monorepo domain.
What changed:
- All `maestro` repo/path/container references → `agentception` equivalents
- All `Stori DAW` product references → removed or replaced with generic `Muse client`
- All `Storpheus` service references → removed or replaced with `Muse` (the protocol)
- Boundary constraint comments (`zero imports from maestro, muse, kly, storpheus`) →
`zero imports from external packages`
- Monorepo dual-container dispatch scripts → single agentception container
- Role personas rewritten: ios-developer, mobile-developer, vp-mobile, vp-ml,
data-scientist, site-reliability-engineer, devops-engineer updated to be generic
(Stori DAW product details stripped; Muse protocol kept intact)
- `muse-specialist` role kept — Muse is our music VCS protocol (like Git for music)
- `scripts/gen_prompts/config.yaml`: removed dead `maestro` codebase entry
- `test_agentception_extraction.py`: import guard updated (still checks for legacy
`import maestro` statements — intentional protection)
- `tools/typing_audit.py`: default dirs now `agentception/ tests/`
- MIDI/Muse vocabulary preserved throughout (beats, Variation, Phrase, NoteChange)
* fix: resolve 21 mypy errors blocking CI (#58)
All errors were pre-existing type issues exposed by mypy's unused-ignore
detection and a stricter unreachable-code check:
- mcp/server.py: fix unreachable branch — type request_id as object first,
narrow to int|str|None with isinstance instead of annotating and guarding
- routes/ui/org_chart.py: remove unused type: ignore[import-untyped] (yaml
now has stubs) and unused type: ignore[assignment]
- routes/ui/plan_ui.py: remove unused type: ignore[attr-defined] on
_strip_fences and _YAML_SYSTEM_PROMPT imports (both now fully typed)
- routes/ui/agents.py: remove 5 unused type: ignore[arg-type] on db_run.get()
- tests/test_issue_creator.py: remove 4 unused type: ignore[assignment]
- tests/test_agentception_mcp_plan.py: remove unused ignores; fix real type
errors by adding isinstance narrowing before subscripting list[object] and
dict[object,object] values returned from dict[str,object] payloads
- tests/test_pipeline_panel.py: remove unused type: ignore[index]
- tests/test_agentception_analyze_partial.py: remove 3 unused type: ignore[arg-type]
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* chore: set typing ceiling to zero Any, rename from ratchet to ceiling (#59)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* fix: correct dispatcher prompt path from .cursor/ to .agentception/ (#65)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* fix: correct role-taxonomy.yaml path in models package and remove dead models.py (#66)
The package __init__.py computed the path as parent×2 from its own location
(agentception/models/__init__.py), resolving to agentception/ rather than the
repo root. Corrected to parent×3 so the path reaches scripts/gen_prompts/.
Also deleted the shadowed agentception/models.py — Python always resolves
agentception.models to the package directory, making the flat file permanently
dead code and causing a mypy "duplicate module" error.
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* fix: implement MCP initialize/initialized handshake so Cursor can connect (#67)
Cursor sends an `initialize` request the moment it opens the stdio transport.
Our server returned Method-not-found, which caused Cursor to close the
connection immediately.
Changes:
- server.py: handle `initialize` — respond with protocolVersion, capabilities,
and serverInfo per MCP spec 2024-11-05.
- server.py: handle `initialized` — it is a JSON-RPC notification (no id),
so return None to signal the caller must not write anything to the wire.
- server.py: update `handle_request` return type to `dict[str, object] | None`.
- stdio_server.py: skip writing when `handle_request` returns None.
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* docs: add developer-workflow guide (closes #60) (#69)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* docs: add contributing guide (closes #61) (#70)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* feat: add scripts/dev.sh convenience wrapper (closes #62) (#68)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* docs: add CHANGELOG.md with v0.1.0, v0.2.0, v0.3.0 seed entries (closes #63) (#71)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* docs: update README Quick Reference with dev tools, guides, changelog (closes #64) (#72)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* feat: dispatcher pending-launch guard and async MCP stdio (#73)
Prevents the poller from sweeping `pending_launch` runs to `unknown`
before the Dispatcher ever reads them, and wires up async tool execution
in the stdio MCP transport so build tools (and plan tools) are awaited
correctly instead of returning an error.
Changes:
- db/persist.py: exclude `pending_launch` from the orphan-sweep active
statuses; add detailed warning-level logging to persist_agent_run_dispatch;
protect pending_launch rows from being clobbered by the poller
- db/queries.py: surface host_worktree_path in get_pending_launches results
- db/models.py: minor model alignment
- mcp/server.py: add handle_request_async that awaits call_tool_async,
fixing all async tools in the stdio transport
- mcp/stdio_server.py: own the event loop with asyncio.run; call
init_db on startup; use handle_request_async; add structured logging
- mcp/build_tools.py: add warning-level debug logging to
build_get_pending_launches so Dispatcher runs are traceable
- routes/api/build.py: use settings.ac_url (removes getattr fallback);
add warning-level tracing around dispatch_label_agent DB write
- config.py: add ac_url setting (default http://localhost:10003) with
AC_URL env-var override
- .agentception/dispatcher.md: update dispatcher prompt
- .agentception/roles/devops-engineer.md: minor role update
- .gitignore, Dockerfile, docker-compose.yml, README.md: housekeeping
- tests/test_persist_pending_launch_guard.py: regression tests for the
pending_launch guard (queue not drained by poller sweep)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* fix: narrow handle_request return type in MCP plan tests (#74)
handle_request now returns dict[str, object] | None (None for
notifications). The existing tests assumed a non-None dict, causing 24
mypy errors in CI. Add a local _unwrap() helper that asserts the
response is not None and narrows the type, then thread it through all
handle_request call sites in the test file.
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* fix: eliminate all 47 Any patterns — zero typing audit violations (#75)
* fix: eliminate all 47 Any patterns — zero typing audit violations
Replaces every dict[str, Any], list[Any], and # type: ignore with
proper TypedDicts and isinstance narrowing across all 7 offending files.
Category 1 — structured return shapes (39 occurrences):
- db/queries.py: 20 TypedDicts (BoardIssueRow, AgentRunRow, AgentRunDetail,
PipelineTrendRow, IssueDetailRow, PRDetailRow, WaveRow, PendingLaunchRow,
AgentEventRow, AgentThoughtRow, and 10 more). AgentEventRow.payload is
now a raw JSON string; build_ui.py updated to json.loads it on use.
- routes/ui/org_chart.py: 7 TypedDicts (RoleEntry, AnnotatedRoleEntry,
PipelineConfig, OrgPreset, TierEntry, BuilderContext). _read_pipeline_config
parses keys explicitly instead of PipelineConfig(**raw).
Category 2 — # type: ignore on dict narrowing (4 occurrences):
- api_reference.py: 2×assignment — isinstance checks before subscript.
- agents.py: 1×assignment — removed; AgentRunDetail now typed correctly.
- docs.py: 1×return-value — widened return to Response instead of HTMLResponse.
Category 3 — import-untyped (1 occurrence):
- _shared.py: added markdown to pyproject.toml mypy.overrides instead of
# type: ignore[import-untyped].
Category 4 — list[Any]/AsyncIterator[Any] (2 occurrences):
- tests/test_issue_creator.py: TypeGuard narrowers (_is_start, _is_label,
_is_issue, _is_done, _is_error) replace bare Any in _collect helper and
enable typed event discrimination throughout the test suite.
Cascading caller fixes:
- agents.py: 4 new TypedDicts (AgentEnrichedRow, EnrichedAgentRunRow,
BatchRow, RoleGroupRow); all dict[str, object] annotations updated.
- build_ui.py: EnrichedIssueRow + EnrichedPhaseGroupRow; mutating
PhasedIssueRow replaced with explicit TypedDict construction.
- telemetry.py: list[dict[str, object]] → list[PipelineTrendRow].
Result: mypy strict — 0 errors (143 files); typing audit — 0 Any patterns.
* docs: update CI threshold and add DB query TypedDict reference
Update typing-ratchet --max-any from 10 to 0 in ci.md to reflect the
enforced zero-Any ceiling. Add a DB Query TypedDicts section to
type-contracts.md documenting all 20+ named row types introduced during
the Any-elimination refactor.
---------
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
---------
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* Local model integration (#986)
* feat: wire enrich_plan_with_codebase_context into plan_ui.py and issue_creator.py (#912)
- Add import of `enrich_plan_with_codebase_context` to both call sites
- plan_ui.py: call enricher after `PlanSpec.model_validate(parsed)`, before `spec.to_yaml()`
- issue_creator.py: call enricher at top of `file_issues()`, before `repo = _cfg.gh_repo`
- Both call sites use try/except with logger.warning on failure — enrichment is best-effort
- Add integration tests in `agentception/tests/test_plan_enricher_integration.py`:
- test_filed_issue_body_contains_codebase_locations: verifies enriched bodies flow through
- test_enrichment_failure_does_not_block_filing: verifies RuntimeError is swallowed
Closes #870
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: mechanically enforce STOP after pytest exits 0 (#913)
After any run_command call containing "pytest" returns exit_code=0, the
agent loop arms a hard-stop interrupt. On every subsequent iteration,
_PYTEST_STOP_BLOCKED_TOOLS (read_file, search_text, list_directory, etc.)
are intercepted and returned as synthetic errors — the same mechanism as
the loop guard — so the agent cannot enter a post-test audit loop.
The stop is armed per-iteration in extra_system_blocks with _PYTEST_STOP_OVERRIDE
and enforced mechanically in a second interception pass (Pass 2) after the
existing loop guard. It is disarmed automatically when:
- The agent writes new code (file-mutating tools) in a later iteration, because
the new code is untested and must be re-verified.
- A subsequent pytest invocation fails, so the agent can fix the regression.
Two regression tests added to TestPytestHardStop:
- test_pytest_stop_blocks_reads_after_clean_exit: verifies HARD STOP appears
in extra_system_blocks and read_file is never dispatched after pytest passes.
- test_pytest_stop_disarmed_by_file_write: verifies read_file reaches the
real dispatcher once a file write disarms the stop.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: tool surface audit — 13 bugs, dead code, schema mismatches (#914)
Critical
- build_complete_run inputSchema now declares grade and reviewer_feedback
(both optional strings); additionalProperties: False was silently stripping
them, making the reviewer grade pathway unreachable through any MCP client
- build_complete_run now returns isError=not bool(result.get("ok", True))
instead of hardcoded False; every other build_* tool already did this
High
- search_text and find_call_sites: remove --max-count from rg args; add
_truncate_rg_output() which counts only numbered match lines and stops at
n_results total across all files — the previous per-file limit could return
N×num_files lines despite the schema saying "max N total"
- definitions.py: update both n_results descriptions to "total … across all files"
Medium
- _PYTEST_STOP_OVERRIDE and synthetic error message now say run_command
(git add/commit/push) as the commit path, with git_commit_and_push listed
as a secondary option — developer agents don't have git_commit_and_push
- replace_in_file dispatch: allow_multiple coercion widened from
isinstance(allow_raw, bool) to isinstance(allow_raw, (bool, int)) so an
integer 1 from the model is not silently treated as False
Dead code (no-legacy rule)
- _READ_ONLY_TOOL_NAMES = frozenset() legacy alias deleted from agent_loop.py
- build_spawn_child_run (106-line orphaned function) deleted from
build_commands.py; its spawn_child/SpawnChildError/Tier/ScopeType imports
removed; module docstring updated; runs.py and docs/guides/mcp.md updated
to reference the live build_spawn_adhoc_child MCP tool
- plan_get_labels, plan_get_cognitive_figures dead imports removed from server.py
- "create_directory" ghost entry removed from _KEY_ARG log-hint dict
Style / non-idiomatic
- _CLASS_DEF_RE moved from inside insert_after_in_file body to module-level
constant (was re-compiled on every call)
- f-string without format args removed in shell_tools.py
- search_codebase collection description: "Omit (or leave null)" → "Omit"
(type is "string" with no null union; "leave null" could cause API rejection)
- read_file docstring: path resolution note updated to reflect dispatcher reality
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: address three deferred tool-surface audit items
_GUARD_PERMITTED_TOOL_NAMES: add run_command and git_commit_and_push to the
loop-guard allowlist. Without them a guarded agent writes code it can't verify
(no mypy/pytest) and can't deliver (no git push), trapping it in an infinite
guard loop. Regression guard: test_guard_allowlist_includes_shell_tools asserts
both tools are present so the allowlist can never silently regress.
find_call_sites regex: expand the pattern to cover from-import lines
(from x import symbol) and type-annotation contexts (symbol: / symbol[).
Previously only call sites (symbol() and bare import lines were matched.
read_symbol heuristic: document that the indentation-based fallback path does
NOT work for brace-delimited languages (TypeScript, JavaScript). If a TS agent
role is added the heuristic must be replaced with a tree-sitter extractor.
* feat: suppress implementing/reviewing status badge in active-lane cards (#916)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add Plan enrichment section to docs/reference/api.md (#872)
* fix: two reviewer merge bugs — self-approval failure and unmerged-PR teardown
Bug 1: reviewer prompt instructed pull_request_review_write(event="APPROVE")
before merge_pull_request. GitHub rejects self-reviews (author == reviewer
same account) with 422, wasting an iteration and confusing the model into
giving up on the merge. Fix: remove the APPROVE step entirely from the
grade-A flow. The merge stands alone; GitHub doesn't require an approved
review to merge unless branch protection rules enforce it.
Bug 2: when merge_pull_request failed (branch behind dev), the reviewer
called build_complete_run(grade="A") without merging. build_complete_run
responded by scheduling teardown_agent_worktree, which deleted the remote
branch, which caused GitHub to auto-close the unmerged PR.
Fix: build_complete_run now calls _is_pr_merged (GitHub REST API 204 check)
before scheduling teardown for reviewer grade-A/B completions. If the PR
is not yet merged the call returns an error, forcing the reviewer to call
merge_pull_request first.
Also strengthens the "branch behind dev" section in reviewer.md.j2 to
make the mandatory rebase+retry path explicit and unambiguous.
Regression tests:
- test_build_complete_run_blocks_grade_a_when_pr_not_merged
- test_build_complete_run_allows_grade_a_when_pr_merged
* refactor(scss): split _build.scss into per-feature partials (#873)
_build.scss was 2069 lines. Every frontend ticket appended to it, causing
rebase conflicts between parallel agents. Extracted five component-scoped
partials and one layout partial; replaced _build.scss with a @use barrel.
New files:
_inspector-layout.scss — all inspector/build layout rules (lines 1-1892)
_thought-block.scss — .thought-block (58 lines)
_file-edit-card.scss — .file-edit-card + .diff-add/remove/context (32 lines)
_assistant-bubble.scss — .assistant-bubble (12 lines)
_tool-call-card.scss — .tool-call-card (42 lines)
_event-card.scss — .event-card (27 lines)
Compiled app.css is byte-identical to pre-change artifact:
sha256: 17e18481c378787f472aee951c3a4ce218f2e9d5b06721b0570779156e50d875
.gitattributes: merge=union added for all six new partials and the barrel.
* feat: convert agentception/db/queries.py into a package (#874) (#921)
Move queries.py → queries/_monolith.py unchanged (no query logic modified).
Create queries/types.py with all 46 TypedDict definitions extracted from the
monolith. Wire queries/__init__.py with explicit re-exports (import X as X)
so that every existing caller continues to work without change.
mypy passes on all 278 files; zero new test failures introduced.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: split query monolith into 5 domain submodules (#876) (#922)
Create agentception/db/queries/{board,runs,messages,events,metrics}.py.
Each file imports TypedDicts from agentception.db.queries.types (not
from the monolith), carries only functions (no TypedDict definitions),
and is importable in isolation.
_monolith.py and __init__.py are unchanged — callers are not affected.
The next issue will wire __init__.py re-exports to these files and
delete _monolith.py.
mypy passes on all 283 source files; zero new test failures introduced.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: wire queries/__init__.py from domain submodules, delete _monolith.py (#875) (#923)
Replace the monolith re-export in __init__.py with explicit per-domain
imports (board, runs, messages, events, metrics, types). All 102 public
and test-required private symbols are re-exported via the `X as X` pattern
so every existing `from agentception.db.queries import X` call-site
continues to work without modification.
Delete _monolith.py — the 3250-line file is fully replaced by the six
focused submodules.
Add merge=union .gitattributes entries for all six new domain files.
mypy passes on 282 source files; zero new test failures.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: audit imports, fix broken tests, document new module structure (#877) (#924)
Audit: zero direct-path queries.py imports in the test suite (all three
grep checks pass clean).
Fix 4 previously broken tests by updating stale mock.patch targets to
point at the domain submodule where each dependency is locally bound:
- test_get_daily_metrics_returns_zeros_on_db_error:
agentception.db.queries.get_session
→ agentception.db.queries.metrics.get_session
- test_get_issues_grouped_by_phase_* (3 tests):
agentception.db.queries.get_session
→ agentception.db.queries.board.get_session
agentception.db.queries.get_initiative_phase_meta
→ agentception.db.queries.board.get_initiative_phase_meta
Docs: add "## Query module structure" and "## SCSS partial structure"
sections to docs/architecture.md describing the six query submodules
and six SCSS partials, their domain ownership, and the merge=union
.gitattributes strategy.
mypy clean on 282 files; 104 tests pass with zero failures.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: log per-turn token counts and track last_input_tokens in agent loop (#925)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: token-aware _prune_history with last_input_tokens budget guard (#926)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: add missing recorded_at to ACAgentEvent inserts in persist.py (#927)
Two ACAgentEvent creation sites were missing the required recorded_at
field — complete_agent_run (build_complete_run event) and the orphan
sweep (orphan_failed event). The NOT NULL constraint caused a DB
exception that rolled back the whole transaction, so the agent run
status was never updated from implementing to completed/done.
The result was that every successfully-completed agent run left its DB
row stuck on status=implementing despite the PR being merged. The
orphan sweep then had the same bug but was harder to notice because
it fires infrequently.
Fix: pass recorded_at=_now() (complete_agent_run) and recorded_at=now
(orphan sweep, which already has `now` in scope).
Also manually corrected issue-884's stuck row to status=done.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: replace dict[str, Any] with DailyMetrics TypedDict in test_metrics_api.py (#928)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: inject _CONTEXT_PRESSURE_WARNING into extra_blocks when input tokens exceed threshold (#929)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: fetch origin/dev before creating developer worktree (#931)
Developer dispatches set worktree_base = "origin/dev" without first
fetching, so the container's ref tracker could lag behind GitHub by one
or more merged PRs. The worktree was created from a stale tip, and the
agent's first push immediately diverged from origin/dev — causing an
unnecessary rebase conflict on every back-to-back dispatch.
Reviewers and continuations already fetched their branch before creating
the worktree (lines 718-724). Mirror that pattern for developers in the
else-branch: run git fetch origin dev before setting worktree_base.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: add _summarise_history and context checkpoint injection to _prune_history (#930)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: update stale patch paths and add _is_pr_merged mock; add context window management docs (#932)
- Update patch targets in test_phase_grouping.py, test_get_initiatives.py,
and test_persist_regression.py from agentception.db.queries.get_session
to agentception.db.queries.board.get_session (post-#876 module split)
- Add _is_pr_merged mock to test_build_complete_run_reviewer_does_not_redispatch_reviewer
so the test does not make a live GitHub API call
- Add ## Context window management section to docs/architecture.md
Closes #888
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: add prompt_variant to ACAgentRun, AgentTaskSpec, and Alembic migration 0011 (#933)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: orphan sweep grace period to avoid race with dispatch (#934)
Do not mark a run failed when last_activity_at is within 60s. The poller
builds live_ids from list_active_runs() at tick start; dispatch can commit
acknowledge_agent_run after that, so the run is in the DB as implementing
but not yet in live_ids. Without the grace period the orphan sweep would
immediately mark it failed.
- Add _ORPHAN_GRACE_SECONDS and skip orphan when last_activity_at recent
- Add test_orphan_grace_period_skips_recently_acknowledged_run
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: variant-aware _load_role_prompt and prompt_variant dispatch param (#935)
* feat: variant-aware _load_role_prompt and prompt_variant dispatch param
* fix: add Any type parameter to captured_kwargs in test_dispatch_variant
---------
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: remove Any from test_dispatch_variant (typing ratchet) (#937)
Use list[dict[str, str | int | None | bool]] and typed **kwargs
instead of list[dict[str, Any]]. Satisfies zero-Any rule.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: add activity_events module with persist helper and TypedDict payload shapes (#945)
* feat: add activity_events module with persist helper and TypedDict payload shapes
* fix: correct db_session fixture return type to Generator[Session, None, None]
---------
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add planning prompts and tight-ticket standards guide (#946)
- What human prompts 1A receives (brain dump) and what 1A/1B produce
- PlanSpec issue body format (seven sections, order)
- Informal rules for brain dumps that yield agent-ready tickets:
one deliverable, cap read-heavy work, specific locations, clear done
criteria, minimal phases, format when needed, doc-only vs implement
- References to llm_phase_planner, issue_creator, plan-spec
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* tools: steer agent to search_text over run_command(grep) for code search (#947)
- search_text: add 'PREFER this over run_command(grep/ripgrep) for
searching the codebase' so the model uses the dedicated ripgrep tool
instead of shelling out to grep.
- run_command: add 'For searching the codebase for a string or regex,
use search_text instead of grep.' to avoid defaulting to grep.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: clear worktree_path in DB after reaper releases worktree (#948)
Reaper was re-finding the same terminal runs every pass because we never
cleared worktree_path after release_worktree(). Now we call
clear_run_worktree_path(run_id) only when release_worktree returns True,
so the next reaper pass (or startup) does not re-process the same runs.
release_worktree now returns bool for success/failure.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: agent_loop path/lines + search_text logging, watch_run display, block grep in run_command (#949)
- agent_loop: log path + line range for read_file_lines, pattern + directory for search_text
- watch_run: parse dispatch_tool args and show read path lines X–Y and search_text pattern/dir
- shell_tools: block run_command(grep), direct agent to search_text (ripgrep, .gitignore-aware)
- test_shell_tools: test_grep_blocked_use_search_text
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add watch-run-log-map.md mapping every log pattern to emission site (#950)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: preemptive nudge before loop guard; only dispatch reviewer after worktree release (#951)
- agent_loop: inject ONE TURN BEFORE READ-ONLY LOCK nudge when
iterations_since_write == threshold-1 so model calls write in that response
- build_commands: only call auto_dispatch_reviewer when release_worktree
returns True; return error to agent when release fails
- tests: test_preemptive_nudge_injected_one_turn_before_loop_guard,
test_implementer_completion_fails_when_release_worktree_returns_false
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: clear run events/messages on dispatch; dedupe read_file_lines in watch_run (#952)
- persist: delete ACAgentEvent and ACAgentMessage for run_id at start of
persist_agent_run_dispatch so re-dispatches show a clean timeline
- watch_run: do not render dispatch_tool line for read_file_lines; show
only the file_tools result line (path + lines + total) to avoid duplicate
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(activity-events): emit tool_invoked, delay, llm_* from agent_loop (#940) (#953)
- agent_loop: persist tool_invoked after dispatch_tool log; persist delay in
_enforce_turn_delay; thread session/run_id through loop, _dispatch_tool_calls,
_dispatch_single_tool (session optional for debug_loop).
- llm: add session, run_id, iteration to call_anthropic_with_tools; persist
llm_iter, llm_usage, llm_reply, llm_done after existing logger calls.
- activity_events: persist_activity_event accepts AsyncSession (caller flushes).
- tests: test_agent_loop_activity_events.py (tool_invoked, delay); update
TestEnforceTurnDelay and TestDispatchToolCalls for new session/run_id args.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix(agent_loop): continue when stop_reason=tool_calls with empty list (#954)
Anthropic can return stop_reason='tool_calls' with tool_calls=[] (e.g.
truncation or API quirk). We previously fell through to 'unexpected' and
cancelled the run. Now we inject a nudge and continue the loop instead of
cancelling.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* chore: ship resync UX, poll docs, and dispatch script (#955)
- Mission Control: Reload button spins in place (no separate loading spinner),
triggers immediate board refresh via HX-Trigger after resync. Button is its
own HTMX indicator; aria-label/title 'Refresh from GitHub'.
- resync.py: Return HX-Trigger: refreshBoard on HTMX success so board refetches
immediately after resync.
- .env.example + docs/reference/poller.md: Document 5s poll default (safe for
GitHub rate limits); align example and poller doc with config default.
- tests: test_build_page_structure resync assertion; test_build_ui aria-label.
- scripts/dispatch_issue_941.py: One-off script to dispatch issue 941 via
POST /api/dispatch/issue (fetch issue from GitHub, then POST).
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: emit activity events from file_tools (read, replace, insert, write) (#956)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix(build_complete_run): release derived worktree path when DB has none (#957)
When get_agent_run_teardown returns None or worktree_path is null we skipped
release_worktree but still dispatched the reviewer, causing 'branch already
used by worktree' on reviewer dispatch. Now we derive the path as
worktrees_dir/run_id and attempt release before dispatching; only run rebase
when the path exists. Tests updated to expect release_worktree with derived path.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(942): emit activity events from shell_tools, git helper, and GitHub MCP (#958)
- run_command: emit shell_start (cmd_preview, cwd) and shell_done (exit_code, stdout_bytes, stderr_bytes)
- git_commit_and_push: emit git_push (branch) after successful push
- agent_loop: emit github_tool (tool_name, arg_preview) when dispatching GitHub MCP tools
- Optional run_id/session on run_command and git_commit_and_push; persist only when both set
- Wrap all persist calls in try/except so DB failures never propagate
- Add tests/tools/test_shell_github_activity_events.py (shell_start/done, github_tool, persist failure, git_push)
- Fix mypy: test_build_commands_rebase await_args can be None
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(943): extend inspector SSE to stream all activity events in chronological order (#959)
- Add get_all_events_tail(run_id, after_id) in db/queries/events.py; get_agent_events_tail delegates to it
- _inspector_sse: use get_all_events_tail, merge events and thoughts by recorded_at, emit in order
- Emit activity events as {"t": "activity", "subtype", "payload", "recorded_at", "id"}; other events keep {"t": "event", ...} with id
- Add tests/routes/test_inspector_sse.py: activity events in stream, cursor advances, ordered by id
- Update existing tests to patch get_all_events_tail
- Document GET /ship/runs/{run_id}/stream and activity subtypes in docs/reference/api.md
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(944): consume SSE activity messages and append live DOM rows to inspector feed (#960)
- Add activity_feed.ts: ActivityMessage type, formatActivitySummary (per-subtype text), appendActivityRow (div.activity-feed__row with data-subtype, icon placeholder, summary, time), attachActivityFeedHandler
- Wire attachActivityFeedHandler in build.ts _openStream
- Add _activity-feed.scss for .activity-feed__row, __icon, __summary, __ts
- Unit tests: formatActivitySummary, appendActivityRow, attachActivityFeedHandler
- All payload text via textContent/setAttribute; no innerHTML
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: variant-aware _load_role_prompt and prompt_variant dispatch param (fixes mypy) (#936)
* feat: variant-aware _load_role_prompt and prompt_variant dispatch param
* fix: type annotations in test_dispatch_variant (mypy)
Use list[dict[str, str | int | None | bool]] and typed **kwargs
instead of bare list[dict]. No Any.
---------
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(891): add developer-streamlined prompt variant for A/B testing (#961)
- Add scripts/gen_prompts/templates/roles/developer-streamlined.md.j2 (slimmed variant:
'item' not 'AC item', Step 3 = Ship and open PR combined; mypy + tests + Hard rules kept)
- Run generate.py → .agentception/roles/developer-streamlined.md
- Add tests/test_developer_streamlined_prompt.py (exists, contains mypy, excludes AC item/code smell)
- Default developer.md unchanged; variant used only when prompt_variant=streamlined
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(892): add GET /api/metrics/ab for per-variant A/B metrics (#962)
- New read-only endpoint: GET /api/metrics/ab?days=N (default 7, clamp 1–90)
- ABVariantMetrics: variant (COALESCE prompt_variant, 'control'), role, runs,
avg_iterations, avg_input_tokens, total_tokens, pass_rate, passed, failed
- Query joins agent_runs to agent_events (step_start count, done event grade)
- Returns 200 with empty variants: [] when no data or on DB error
- Register ab_metrics router in api __init__
- Tests: response shape (control + streamlined), empty DB, days param, 422 for days=0
- Docs: GET /api/metrics/ab in api.md with query params and response schema
- No change to dispatch, prompts, or existing routes
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(893): add HTMX A/B metrics panel to build page (#963)
- build.html: polling div #ab-metrics-panel with hx-get=/api/metrics/ab, every 30s
- partials/_ab_metrics.html: table.ab-metrics (Variant, Runs, Avg Iters, Pass Rate, Avg Tokens)
- ab_metrics.py: when HX-Request: true return HTML partial, else JSON (response_model=None)
- _ab_metrics.scss: table styles + .loading-placeholder; import in pages/_build.scss
- test_ab_metrics_panel.py: polling div in build.html, HTMX returns HTML, JSON unaffected
- api.md: A/B Metrics Panel note (HX-Request returns HTML)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs(964): add local LLM MLX guide for Qwen 3.5 35B on Apple Silicon (#968)
- docs/guides/local-llm-mlx.md: install (mlx-lm / mlx-vlm), model choice
(mlx-community/Qwen3.5-35B-A3B-4bit), run options (generate, chat, server),
powermetrics/Activity Monitor for CPU/GPU/ANE, checklist for #964
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(local-llm): full pipeline for local Qwen 3.5 35B, doc and flow
- Config: use_local_llm, local_llm_base_url, chat path, model, context caps
- LLM: call_local_with_tools (same contract as Anthropic), call_local_completion
- Agent loop: developer uses local LLM with full system prompt + cognitive arch;
only endpoint and context caps differ; comment and doc clarify same pipeline
- API: GET /api/local-llm/hello, POST /api/local-llm/hello-agent
- Docker: extra_hosts for host.docker.internal
- Watch script: local LLM indicator in ITER line, heartbeat shows local vs LLM
- Docs: local-llm-mlx.md — 48 GB runbook, mlx-openai-server, mlx-vlm >= 0.3.12,
two-step install, torch/torchvision for processor, same pipeline as Anthropic
* feat: plan-scoped integration branch (#974)
- Create plan branch on first dispatch (from origin/dev); reuse for later issues.
- Persist plan_id and plan_branch on runs; persist plan_issues when file_issues completes.
- Use plan branch as worktree base and PR base when plan_id is set; inject PR base into briefing.
- Rebase implementer branch onto plan branch (not dev) before dispatching reviewer when plan-scoped.
- When last issue in plan is merged into plan branch: rebase plan onto dev, open plan→dev PR, dispatch reviewer.
New tables: plan_issues, plan_branches. Migration 0012.
Docs: architecture/plan-scoped-integration-branch.md updated; link from architecture.md.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add LLM provider abstraction architecture doc (#975)
- Add docs/architecture/llm-provider-abstraction.md (planning status).
- Link from docs/architecture.md Further Reading.
- Remove unneeded scripts/curl_local_plan.sh (was untracked).
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(llm): add provider-agnostic public API (step 1)
Add completion(), completion_stream(), completion_with_tools() as the
single contract. They delegate to call_anthropic* / call_local_with_tools
(use_local_llm for tools). No caller changes yet; next step will switch
plan_ui, llm_phase_planner, agent_loop to use these entry points.
* fix(agent_loop): init tc_to_dispatch/tool_results each iteration; test: mock get_session and use_local_llm
- Initialize tc_to_dispatch and tool_results at start of each loop iteration
so bookkeeping (pytest arm, etc.) never hits UnboundLocalError when
stop_reason is 'length' or 'tool_calls' with empty tool_calls.
- Tests: add _mock_get_session() and patch get_session in all run_agent_loop
tests so they run without init_db(); set use_local_llm=False on mock
settings; add session.flush = AsyncMock() so dispatch path is awaitable.
* feat(llm): wire callers to provider-agnostic API (phase two)
- llm_phase_planner: use completion() instead of call_anthropic
- plan_ui: use completion_stream() instead of call_anthropic_stream
- agent_loop: use completion() for recon and _summarise_history;
use completion_with_tools() for main loop (replaces call_local_with_tools
and call_anthropic_with_tools). Provider selection remains in llm layer.
- Tests: patch completion/completion_with_tools/completion_stream at
call sites; plan_ui tests patch completion_stream; agent_loop tests
patch completion and completion_with_tools. LLM retry tests still
exercise call_anthropic_with_tools in llm module.
- Docstring: llm_phase_planner describes completion() as entry point.
* fix(typing): resolve typing audit — TypedDict payloads, no type: ignore
- activity_events: convert FileReadPayload, FileReplacedPayload,
FileInsertedPayload, FileWrittenPayload from dict subclass to TypedDict
so dict literals type-check. Accept Mapping[str, object] in
persist_activity_event so TypedDict payloads are accepted.
- file_tools: restore named payload types and remove all four
type: ignore[assignment]. _emit_activity accepts Mapping[str, object].
Typing audit: 0 Any, 0 type_ignore; mypy clean.
* feat(llm): phase three — LLM_PROVIDER config and single adapter selection
- Add LLMProviderChoice (anthropic | local) and llm_provider config (LLM_PROVIDER env).
- effective_llm_provider: USE_LOCAL_LLM=true overrides to local for backward compat.
- completion(), completion_stream(), completion_with_tools() branch only on
settings.effective_llm_provider; local completion_stream yields one content chunk.
- agent_loop and local_llm route use effective_llm_provider instead of use_local_llm.
- Tests: config effective_llm_provider + LLM_PROVIDER parsing; llm provider selection.
- Doc: llm-provider-abstraction step 4 marked done.
* feat(llm): phase four — local adapter behind contract
- Content normalization: _normalize_openai_message_content() handles
message.content as string or list of parts; strips reasoning, concatenates
text for final answer. Used in completion and tool-use response parsing.
- Local adapter helpers: _local_base_url(), _local_chat_url(),
_local_completion_payload(); call_local_completion() accepts temperature.
- True streaming: _local_completion_stream() POSTs with stream=true, parses
SSE, maps delta.content / delta.reasoning_content to LLMChunk; on failure
or unsupported server falls back to one-shot and yields one content chunk.
- Public completion_stream() uses _local_completion_stream for local provider.
- /api/local-llm/hello uses public completion() instead of call_local_completion.
- Tests: normalize (string, list, reasoning stripped, empty); stream fallback.
- Doc: step 3 (Local adapter) marked done; status Phases 1–4 implemented.
* docs(llm): phase five — LLM contract, deployment guide, cross-doc updates
- Add docs/reference/llm-contract.md: contract (completion, completion_stream,
completion_with_tools), types, provider selection (LLM_PROVIDER,
effective_llm_provider), adapters, step-by-step how to add a provider.
- Rewrite docs/guides/local-llm-mlx.md: Config and environment section with
full table (LLM_PROVIDER, USE_LOCAL_LLM, all LOCAL_LLM_*); How the local
adapter works (normalization, streaming, fallback); update all integration
steps to mention LLM_PROVIDER and effective provider; add LLM contract ref.
- docs/reference/type-contracts.md: document public API and provider-agnostic
contract; link to llm-contract.md; update diagrams/tree to completion_*.
- docs/guides/setup.md: add LLM_PROVIDER, USE_LOCAL_LLM to .env table; add
Local LLM subsection with pointer to local-llm-mlx and llm-contract.
- docs/guides/security.md: LLM section covers Anthropic and local provider;
link to llm-contract and local-llm-mlx.
- docs/architecture.md: llm.py described as provider-agnostic; completion_*
and config; context window note uses completion_with_tools().
- docs/architecture/llm-provider-abstraction.md: mark step 6 (Document and
test) done; status Phases 1–5 implemented; add pointers to new/updated docs.
- docs/README.md: add Local LLM with MLX guide; add LLM Contract reference;
system overview and directory structure use completion_* and provider config.
* docs: clarify OpenAI-compatible = wire format, not OpenAI cloud
- local-llm-mlx: Naming section (Chat Completions API, mlx-openai-server
package name vs local inference); Option C retitled; mlx_lm.server /
mlx-openai-server called out as local; 48B runbook link text.
- llm-contract: Local adapter row + sentence on wire format vs vendor.
- setup: LLM_PROVIDER row notes local server, not OpenAI cloud.
* fix(local-llm): cap max_tokens for mlx-openai-server (422)
mlx-openai-server rejects max_tokens > 4096 with HTTP 422. Plan 1A and
streaming used 8192/16k; clamp every local chat payload to
local_llm_completion_token_ceiling (default 4096).
- config: LOCAL_LLM_COMPLETION_TOKEN_CEILING
- llm: _local_cap_max_tokens, call_local_with_tools + _local_completion_payload
- tests: cap + payload + config default
- docs: 422 cause, llm-contract generation budget
* feat(llm): normalize think-tags, fix yaml-parse fallback, decisiveness prompt
- Add _normalize_think_tags() generator in services/llm.py to reclassify
<think>...</think>-wrapped content as type="thinking" chunks; ensures
plan_ui always receives clean thinking/content separation regardless of
whether the backend uses reasoning_content fields or inline tags (Qwen3)
- Add repetition_penalty=1.1 and frequency_penalty=0.3 to local completion
payload to discourage degenerate generation loops
- Remove client-side repetition detector from plan_ui (_REPEAT_WINDOW /
_REPEAT_MIN_LEN) — model-level penalties are the correct layer; the
detector caused false positives on structured YAML with repeated section
headers and prematurely broke valid streams
- Wrap YAML safe_load + PlanSpec.model_validate in plan_ui in its own
try/except so malformed LLM output always falls back to the clarify plan
rather than emitting a stream-level error event
- Rewrite _IDENTITY and _YAML_SYSTEM_PROMPT preamble to be more decisive
and direct, reducing agent "overthinking" and re-planning loops
- Reduce streaming read timeout from 300s to 90s and add specific
httpx.ReadTimeout logging to detect mlx-openai-server stalls faster
- Add 5 unit tests for _normalize_think_tags covering basic split,
no-tags passthrough, already-classified chunks, cross-chunk tags,
and multiline Qwen-style output
- Update test suite: rename/update integration tests to reflect removal
of repetition detection and addition of think-tag normalization
* feat(local-llm): Ollama as primary backend, per-usecase routing, LiteLLM proxy guide
- Remove repetition_penalty from _local_completion_payload — not an
OpenAI-standard parameter; causes silent incompatibility with Ollama's
OpenAI-compat endpoint. frequency_penalty is standard and stays.
- Raise LOCAL_LLM_COMPLETION_TOKEN_CEILING default 4096 → 8192; the old
value was a mlx-openai-server workaround (422 above 4096); Ollama
supports full context lengths. Update test assertion accordingly.
- Add per-usecase model/URL overrides in config.py:
LOCAL_LLM_BASE_URL_PLAN / LOCAL_LLM_MODEL_PLAN for completion_stream()
LOCAL_LLM_BASE_URL_AGENT / LOCAL_LLM_MODEL_AGENT for completion_with_tools()
Properties effective_local_base_url_plan/agent and
effective_local_model_plan/agent fall back to the global values when unset.
Enables Phase 3 two-model routing through LiteLLM Proxy.
- Wire _local_completion_stream to use effective_local_base_url_plan and
effective_local_model_plan; wire call_local_with_tools to use
effective_local_base_url_agent and effective_local_model_agent.
- Update .env: raise LOCAL_LLM_COMPLETION_TOKEN_CEILING to 8192.
- Rewrite docs/guides/local-llm-mlx.md: make Ollama the primary
recommendation with full install/pull/serve runbook; demote
mlx-openai-server to a developer footnote. Add per-usecase config
table and link to the new scaling guide.
- Add docs/guides/local-llm-scaling.md: four-phase local LLM scaling
architecture (single Ollama → LiteLLM Proxy → two models → multi-machine),
litellm-config.yaml, docker-compose snippet, monitoring and
troubleshooting sections.
- Add two new tests for per-usecase config: fallback and override paths.
* fix(ollama): disable think mode for non-streaming calls; forward all LOCAL_LLM_* env vars
Qwen 3.5 with Ollama's thinking mode enabled spends all available tokens
on chain-of-thought reasoning before writing content. With a small token
budget (e.g. max_tokens=128 in /hello), the model exhausts its budget
during thinking and returns empty content.
Fix 1 — add think parameter to _local_completion_payload:
- think=False (default): sends "think": false to Ollama so the model
outputs the answer directly into content, skipping CoT. Used by
call_local_completion (hello, one-shot) and call_local_with_tools
(agent turns) where latency matters more than reasoning depth.
- think=True: used only by _local_completion_stream (Phase 1A planning)
where CoT quality is the priority and tokens are plentiful.
Ignored by backends that do not recognise the field (vLLM, mlx, etc).
Fix 2 — forward all LOCAL_LLM_* env vars in docker-compose.yml:
Previously only USE_LOCAL_LLM and LOCAL_LLM_BASE_URL were forwarded,
so LOCAL_LLM_MODEL, LOCAL_LLM_COMPLETION_TOKEN_CEILING, the per-usecase
overrides, and all other local LLM config were invisible inside the
container. docker compose restart therefore never picked up .env changes
for those vars. Now all LOCAL_LLM_* fields are explicitly forwarded with
sensible defaults matching config.py.
* fix(ollama): raise token budget for non-streaming calls; warn on empty content
Qwen 3.5 with Ollama sends thinking in the reasoning field and the actual
answer in content. When max_tokens is too small (e.g. 128 in /hello),
the model exhausts its budget during chain-of-thought and content is
empty. think: false is sent in the payload for future Ollama support but
is currently ignored.
- Remove max_tokens=128 override in /hello endpoint; completion() already
defaults to max_tokens=4096 which gives the model room to think and
then answer
- Add warning log in _normalize_openai_message_content when content is
empty but reasoning is non-empty, pointing at the ceiling config var
- Update think param docstring to reflect current Ollama behaviour
* feat: hide A/B metrics panel via HTML comment (#984)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
---------
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: comprehensive audit sweep — remove transient files, fix stale content (#1009)
Deletes:
- HACKATHON.md (session artifact, broken links, stale Python 3.11 ref)
- .agentception/verification-761.md (test run report, not a repo artifact)
- scripts/dispatch_issue_941.py (one-off script, docstring says so)
- agentception/EXTRACT.md (monorepo extraction is complete; procedure irreversible)
- docs/migration.md (migration to standalone DB is done; all checklist items past tense)
- agentception/README.md (described Cursor-only zero-LLM-calls arch that no longer exists)
Updates:
- .env.example: replace USE_LOCAL_LLM/mlx_lm.server/:8080 with LLM_PROVIDER/Ollama/:11434
- docs/guides/setup.md: Python 3.11→3.12, container agentception-app→agentception,
MLX→Ollama in local LLM blurb, drop USE_LOCAL_LLM legacy row from env var table
- docs/guides/security.md: port 8080→11434 in local provider example
- docs/reference/llm-contract.md: token ceiling default 4096→8192 (Ollama); note
that 4096 limit is specific to mlx-openai-server
- CHANGELOG.md: correct "AC_ prefix applied to all config keys" — only 4 vars use it
- docs/README.md: "all AC_* env vars" → "environment-variable-driven config"
- docs/cursor-agent-spawning.md: add accuracy note (primary dispatch path is now
agent_loop.py, not Cursor Task tool); fix broken links to missing stress-test file
- docs/guides/contributing.md: add callout distinguishing external-contributor
draft-PR workflow from internal merge-immediately policy (AGENTS.md)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* release: dev → main (#1033)
* docs: overhaul MCP reference with Mermaid diagrams and typed tool schemas (#1030)
Rewrite docs/reference/mcp.md as the single authoritative MCP reference:
- Add 4 Mermaid diagrams (architecture, tool dispatch, resource resolution, agent role surfaces)
- Add fully typed input/output schemas with example JSON-RPC for all 12 tools
- Generalize language from Cursor-specific to client-agnostic
Delete the mcp-tools.md stub (17 lines pointing to mcp.md).
Clean up docs/guides/mcp.md:
- Remove duplicate stdio config block (lines 77-97 were copy of 55-75)
- Generalize remaining Cursor-specific language (keep Cursor as example client)
- Fix stale tool name reference (build_spawn_child -> build_spawn_adhoc_child)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* security: harden agent runtime — path sandbox, secret redaction, denylist, prompt injection guardrails, wall-clock timeout
Five concrete hardening measures from the MCP/agent runtime security audit:
1. File-tool path sandbox (_is_safe_read_path / _is_safe_write_path in
agent_loop.py): reads restricted to worktree + repo root; writes
restricted to worktree only. Symlinks resolved before check. All 10
file/directory tools enforced.
2. Shell command secret redaction (_redact_secrets in shell_tools.py):
all stdout/stderr from run_command is stripped of ANTHROPIC_API_KEY,
GITHUB_TOKEN, DATABASE_URL, AC_API_KEY, ghp_ PATs, sk-ant- keys, and
Bearer tokens before the output reaches the agent or the DB.
3. Expanded shell denylist: adds rm -rf /app, rm -rf /worktrees, nc -e,
/dev/tcp/, and /dev/udp/ to the existing _BLOCKED_PATTERNS.
4. Prompt injection security contract (_RUNTIME_ENV_NOTE in agent_loop.py):
every agent system prompt now includes an explicit instruction to treat
all repository content as untrusted external data, never to exfiltrate
credentials, and never to make unauthorized outbound HTTP requests.
5. Wall-clock timeout (asyncio.timeout in agent_loop.py): wraps the entire
agent loop with AGENT_MAX_WALL_SECONDS (default 7200 s / 2 h); timeout
cancels the run gracefully and transitions it to 'cancelled'.
Docker: adds no-new-privileges, cap_drop ALL, and minimal cap_add to
docker-compose.yml.
Docs: security.md rewritten to document all five hardening layers plus
the threat model update.
Tests: 131 pass (mypy clean, zero Any).
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* security: non-root container user, narrow bind mounts, egress allowlist proxy (phase 2)
Three additional hardening measures completing the phase-2 security audit:
1. Non-root process user (Dockerfile + scripts/entrypoint.sh)
- Creates agentception user UID/GID 1001 in Dockerfile
- Installs gosu (purpose-built privilege-drop helper) alongside git/curl/ripgrep
- scripts/entrypoint.sh performs a two-phase startup:
* Root phase: write /etc/resolv.conf, compile SCSS/JS, run Alembic
migrations, chown /worktrees and model cache to agentception:agentception
* Unprivileged phase: exec gosu agentception "$@" → uvicorn runs as PID 1
with UID 1001, zero capabilities to acquire additional privs
- git system config adds user.email/user.name and http.proxy for worktree ops
- Verified: /proc/1/status shows Uid=1001 in the running container
2. Narrow bind mounts (docker-compose.yml)
- Replaces ./:/app with explicit per-directory mounts: agentception/,
.agentception/, pyproject.toml, org-presets.yaml, scripts/, tests/, tools/
- .env, docker-compose*.yml, .git/, Dockerfile, and all other sensitive or
unnecessary files are intentionally excluded
- Verified: /app/.env is absent inside the running container
- docker-compose.ci.yml updated with matching narrow mounts + no-op proxy
service for CI compatibility
3. Egress allowlist proxy (scripts/tinyproxy/)
- scripts/tinyproxy/Dockerfile.proxy: builds a tinyproxy image from
debian:bookworm-…
* docs: overhaul MCP reference with Mermaid diagrams and typed tool schemas (#1030)
Rewrite docs/reference/mcp.md as the single authoritative MCP reference:
- Add 4 Mermaid diagrams (architecture, tool dispatch, resource resolution, agent role surfaces)
- Add fully typed input/output schemas with example JSON-RPC for all 12 tools
- Generalize language from Cursor-specific to client-agnostic
Delete the mcp-tools.md stub (17 lines pointing to mcp.md).
Clean up docs/guides/mcp.md:
- Remove duplicate stdio config block (lines 77-97 were copy of 55-75)
- Generalize remaining Cursor-specific language (keep Cursor as example client)
- Fix stale tool name reference (build_spawn_child -> build_spawn_adhoc_child)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* security: harden agent runtime — path sandbox, secret redaction, denylist, prompt injection guardrails, wall-clock timeout
Five concrete hardening measures from the MCP/agent runtime security audit:
1. File-tool path sandbox (_is_safe_read_path / _is_safe_write_path in
agent_loop.py): reads restricted to worktree + repo root; writes
restricted to worktree only. Symlinks resolved before check. All 10
file/directory tools enforced.
2. Shell command secret redaction (_redact_secrets in shell_tools.py):
all stdout/stderr from run_command is stripped of ANTHROPIC_API_KEY,
GITHUB_TOKEN, DATABASE_URL, AC_API_KEY, ghp_ PATs, sk-ant- keys, and
Bearer tokens before the output reaches the agent or the DB.
3. Expanded shell denylist: adds rm -rf /app, rm -rf /worktrees, nc -e,
/dev/tcp/, and /dev/udp/ to the existing _BLOCKED_PATTERNS.
4. Prompt injection security contract (_RUNTIME_ENV_NOTE in agent_loop.py):
every agent system prompt now includes an explicit instruction to treat
all repository content as untrusted external data, never to exfiltrate
credentials, and never to make unauthorized outbound HTTP requests.
5. Wall-clock timeout (asyncio.timeout in agent_loop.py): wraps the entire
agent loop with AGENT_MAX_WALL_SECONDS (default 7200 s / 2 h); timeout
cancels the run gracefully and transitions it to 'cancelled'.
Docker: adds no-new-privileges, cap_drop ALL, and minimal cap_add to
docker-compose.yml.
Docs: security.md rewritten to document all five hardening layers plus
the threat model update.
Tests: 131 pass (mypy clean, zero Any).
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* security: non-root container user, narrow bind mounts, egress allowlist proxy (phase 2)
Three additional hardening measures completing the phase-2 security audit:
1. Non-root process user (Dockerfile + scripts/entrypoint.sh)
- Creates agentception user UID/GID 1001 in Dockerfile
- Installs gosu (purpose-built privilege-drop helper) alongside git/curl/ripgrep
- scripts/entrypoint.sh performs a two-phase startup:
* Root phase: write /etc/resolv.conf, compile SCSS/JS, run Alembic
migrations, chown /worktrees and model cache to agentception:agentception
* Unprivileged phase: exec gosu agentception "$@" → uvicorn runs as PID 1
with UID 1001, zero capabilities to acquire additional privs
- git system config adds user.email/user.name and http.proxy for worktree ops
- Verified: /proc/1/status shows Uid=1001 in the running container
2. Narrow bind mounts (docker-compose.yml)
- Replaces ./:/app with explicit per-directory mounts: agentception/,
.agentception/, pyproject.toml, org-presets.yaml, scripts/, tests/, tools/
- .env, docker-compose*.yml, .git/, Dockerfile, and all other sensitive or
unnecessary files are intentionally excluded
- Verified: /app/.env is absent inside the running container
- docker-compose.ci.yml updated with matching narrow mounts + no-op proxy
service for CI compatibility
3. Egress allowlist proxy (scripts/tinyproxy/)
- scripts/tinyproxy/Dockerfile.proxy: builds a tinyproxy image from
debian:bookworm-slim (no third-party base images)
- scripts/tinyproxy/tinyproxy.conf: FilterDefaultDeny Yes + FilterURLs On;
any domain NOT in the allowlist is TCP-rejected at the CONNECT stage
- scripts/tinyproxy/filter: allowlist covers api.anthropic.com, github.com
family, npm registry, HuggingFace, PyPI, Cloudflare DNS
- agentception service routes all HTTP/HTTPS through http://proxy:8888 via
HTTP_PROXY/HTTPS_PROXY/http_proxy/https_proxy environment variables +
git http.proxy system config
- NO_PROXY exempts internal services: postgres, qdrant, host.docker.internal
- Verified: httpx ProxyError for malicious-exfil-server.io; 200 for
api.github.com; api.github.com Zen quote returned through proxy
docs/guides/security.md fully updated with implementation details, threat model
updates, and documented residual risks.
Tests: 131/131 pass, mypy clean, zero Any, generate.py no drift.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: commit package-lock.json — every CI Docker build was failing
package-lock.json was listed in .gitignore, so it never existed in the
GitHub Actions checkout. The Dockerfile has:
COPY package.json package-lock.json tsconfig.json /app/
Without the lockfile in the build context, every CI run failed at this
step with:
ERROR: "/package-lock.json": not found
npm ci (correct for Docker builds) requires the lockfile to guarantee
reproducible, deterministic package installs. The fix is to remove
package-lock.json from .gitignore and track it. This is standard
practice for applications (as opposed to libraries, which should not
commit lockfiles).
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: uncomment A/B metrics panel div in build.html
The div was accidentally wrapped in an HTML comment, causing
test_ab_panel_polling_div_in_build_html to fail: HTMLParser skips
comment content so parser.found stayed {}, failing the id assertion.
The route (/api/metrics/ab), router registration, and partial template
(_ab_metrics.html) were all correctly wired — only the HTML comment
needed removal.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* ci: fold smoke test into pytest job — eliminate redundant image rebuild
The smoke job ran on a fresh GitHub Actions runner and rebuilt the
Docker image (~2-3 min) just to run two curl calls. The image was
already built in the same pipeline run by the mypy and test jobs.
Fix: remove the separate smoke job and append its health probe steps
to the test job. The image is already built and postgres is already
running on that same runner, so the smoke check adds only ~30s
(container start + entrypoint + healthcheck poll) instead of 4+ min.
Pipeline shape is unchanged: generated-files → typecheck → {test, typing-ceiling}
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add Phase 1A screenshot to README How It Works section
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add Phase 1B screenshot to README How It Works section
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add Ship screenshot to README How It Works section
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add Agent Org screenshot to README How It Works section
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: MCP docs and tests — api ref, README, auth docstring, integrate.md, edge-case tests (#1041)
- API reference: add POST /api/mcp to URL taxonomy and new MCP section
with auth note and link to Security guide and MCP reference
- docs/README: fix directory structure — MCP HTTP route is routes/api/mcp.py,
not mcp/http_server.py; list mcp.py under routes/api/
- routes/api/mcp.py: docstring now states endpoint is protected by
ApiKeyMiddleware when AC_API_KEY is set; point to Security guide
- integrate.md: clarify what MCP provides (invoke tools, read resources,
fetch prompts) in opening paragraph
- test_mcp_http: add test_batch_one_invalid_item_one_valid_returns_mixed_results,
test_resources_read_invalid_uri_via_http, test_tools_call_missing_required_arguments_returns_error
- test_mcp_resources: add test_read_resource_batch_tree_malformed_empty_batch_id_returns_not_found
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: full Cursor decoupling — taxonomy and implementation (#1043)
- A: Remove TaskRunnerChoice.cursor, keep anthropic only; drop CI .cursor volume
- B: Rename cursor_project_id → ide_project_id; roles summary and RoleMeta
- C: _scan_cursor_docs → _scan_agentception_docs; reword config/readers/db/docker
- D: Docs generalized to MCP client (e.g. Cursor); legacy note on cursor-agent-spawning
- E: .cursor/ in .gitignore; remove stale Cursor projects dir from settings UI
See docs/reference/cursor-decoupling-taxonomy.md.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* Merge main into dev — resolve docker-compose.ci.yml conflict (#1045)
* chore: clean up all Maestro, Stori DAW, and Storpheus monorepo bleed (#57)
* Remove all machine-specific paths; fix .env for AgentCeption
- Replace every /Users/gabriel and $HOME/dev/tellurstori/* reference with
portable <repo-root> placeholder or $HOME-relative paths throughout all
role prompts, templates, docs, and scripts
- Strip Maestro/Stori legacy vars from .env; keep only AC_* vars needed by
AgentCeption (DB_PASSWORD, AC_GITHUB_TOKEN, AC_GH_REPO, AC_OPENROUTER_API_KEY)
- Set AC_GH_REPO=cgcardona/agentception (was incorrectly tellurstori/agentception)
- Update agent-tree-protocol.md examples to use <repo-root> placeholder with
explanatory note that AgentCeption fills in the real AC_REPO_DIR path
- Fix docstring example in build.py (tellurstori/maestro → cgcardona/agentception)
- Fix test fixture cursor_project_id to use generic example
* Pass GITHUB_TOKEN into container from AC_GITHUB_TOKEN
macOS stores gh CLI tokens in Keychain, not in the config file, so the
read-only ~/.config/gh volume mount doesn't carry auth into Docker.
Passing AC_GITHUB_TOKEN as GITHUB_TOKEN lets gh CLI authenticate without
a service restart or interactive login inside the container.
* Fix pipeline-config.json repo_dir: /repo → /app
The project entry had repo_dir=/repo but the container mounts the repo at
/app (WORKDIR). This caused settings.repo_dir to be overridden to /repo at
startup, making _CONFIG_PATH resolve to /repo/.agentception/pipeline-config.json
which doesn't exist — so the API fell back to empty defaults and the project
switcher showed no projects.
* Drop AC_ prefix from all environment variables
Now that AgentCeption is its own standalone repo there is no namespace
collision risk with a parent monorepo. Unprefixed names are cleaner for
operators and consistent with standard tooling conventions.
Renamed throughout config, compose files, .env, docs, role prompts, and
templates:
AC_DATABASE_URL → DATABASE_URL
AC_GH_REPO → GH_REPO
AC_REPO_DIR → REPO_DIR
AC_WORKTREES_DIR → WORKTREES_DIR
AC_HOST_WORKTREES_DIR → HOST_WORKTREES_DIR
AC_OPENROUTER_API_KEY → OPENROUTER_API_KEY
AC_GITHUB_TOKEN → GITHUB_TOKEN
AC_PORT → PORT
AC_HOST → HOST
AC_LOG_LEVEL → LOG_LEVEL
AC_POLL_INTERVAL_SECONDS → POLL_INTERVAL_SECONDS
AC_GITHUB_CACHE_SECONDS → GITHUB_CACHE_SECONDS
AC_CURSOR_PROJECTS_DIR → CURSOR_PROJECTS_DIR
Also fixes app.py default port 7777 → 10003 (was stale from monorepo era).
AC_URL in .agent-task files is retained — it is a task-file field, not a
system env var.
* Remove all Maestro, Stori DAW, and Storpheus references from codebase
AgentCeption is now fully standalone with zero bleed from the old monorepo domain.
What changed:
- All `maestro` repo/path/container references → `agentception` equivalents
- All `Stori DAW` product references → removed or replaced with generic `Muse client`
- All `Storpheus` service references → removed or replaced with `Muse` (the protocol)
- Boundary constraint comments (`zero imports from maestro, muse, kly, storpheus`) →
`zero imports from external packages`
- Monorepo dual-container dispatch scripts → single agentception container
- Role personas rewritten: ios-developer, mobile-developer, vp-mobile, vp-ml,
data-scientist, site-reliability-engineer, devops-engineer updated to be generic
(Stori DAW product details stripped; Muse protocol kept intact)
- `muse-specialist` role kept — Muse is our music VCS protocol (like Git for music)
- `scripts/gen_prompts/config.yaml`: removed dead `maestro` codebase entry
- `test_agentception_extraction.py`: import guard updated (still checks for legacy
`import maestro` statements — intentional protection)
- `tools/typing_audit.py`: default dirs now `agentception/ tests/`
- MIDI/Muse vocabulary preserved throughout (beats, Variation, Phrase, NoteChange)
* fix: resolve 21 mypy errors blocking CI (#58)
All errors were pre-existing type issues exposed by mypy's unused-ignore
detection and a stricter unreachable-code check:
- mcp/server.py: fix unreachable branch — type request_id as object first,
narrow to int|str|None with isinstance instead of annotating and guarding
- routes/ui/org_chart.py: remove unused type: ignore[import-untyped] (yaml
now has stubs) and unused type: ignore[assignment]
- routes/ui/plan_ui.py: remove unused type: ignore[attr-defined] on
_strip_fences and _YAML_SYSTEM_PROMPT imports (both now fully typed)
- routes/ui/agents.py: remove 5 unused type: ignore[arg-type] on db_run.get()
- tests/test_issue_creator.py: remove 4 unused type: ignore[assignment]
- tests/test_agentception_mcp_plan.py: remove unused ignores; fix real type
errors by adding isinstance narrowing before subscripting list[object] and
dict[object,object] values returned from dict[str,object] payloads
- tests/test_pipeline_panel.py: remove unused type: ignore[index]
- tests/test_agentception_analyze_partial.py: remove 3 unused type: ignore[arg-type]
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* chore: set typing ceiling to zero Any, rename from ratchet to ceiling (#59)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* fix: correct dispatcher prompt path from .cursor/ to .agentception/ (#65)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* fix: correct role-taxonomy.yaml path in models package and remove dead models.py (#66)
The package __init__.py computed the path as parent×2 from its own location
(agentception/models/__init__.py), resolving to agentception/ rather than the
repo root. Corrected to parent×3 so the path reaches scripts/gen_prompts/.
Also deleted the shadowed agentception/models.py — Python always resolves
agentception.models to the package directory, making the flat file permanently
dead code and causing a mypy "duplicate module" error.
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* fix: implement MCP initialize/initialized handshake so Cursor can connect (#67)
Cursor sends an `initialize` request the moment it opens the stdio transport.
Our server returned Method-not-found, which caused Cursor to close the
connection immediately.
Changes:
- server.py: handle `initialize` — respond with protocolVersion, capabilities,
and serverInfo per MCP spec 2024-11-05.
- server.py: handle `initialized` — it is a JSON-RPC notification (no id),
so return None to signal the caller must not write anything to the wire.
- server.py: update `handle_request` return type to `dict[str, object] | None`.
- stdio_server.py: skip writing when `handle_request` returns None.
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* docs: add developer-workflow guide (closes #60) (#69)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* docs: add contributing guide (closes #61) (#70)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* feat: add scripts/dev.sh convenience wrapper (closes #62) (#68)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* docs: add CHANGELOG.md with v0.1.0, v0.2.0, v0.3.0 seed entries (closes #63) (#71)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* docs: update README Quick Reference with dev tools, guides, changelog (closes #64) (#72)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* feat: dispatcher pending-launch guard and async MCP stdio (#73)
Prevents the poller from sweeping `pending_launch` runs to `unknown`
before the Dispatcher ever reads them, and wires up async tool execution
in the stdio MCP transport so build tools (and plan tools) are awaited
correctly instead of returning an error.
Changes:
- db/persist.py: exclude `pending_launch` from the orphan-sweep active
statuses; add detailed warning-level logging to persist_agent_run_dispatch;
protect pending_launch rows from being clobbered by the poller
- db/queries.py: surface host_worktree_path in get_pending_launches results
- db/models.py: minor model alignment
- mcp/server.py: add handle_request_async that awaits call_tool_async,
fixing all async tools in the stdio transport
- mcp/stdio_server.py: own the event loop with asyncio.run; call
init_db on startup; use handle_request_async; add structured logging
- mcp/build_tools.py: add warning-level debug logging to
build_get_pending_launches so Dispatcher runs are traceable
- routes/api/build.py: use settings.ac_url (removes getattr fallback);
add warning-level tracing around dispatch_label_agent DB write
- config.py: add ac_url setting (default http://localhost:10003) with
AC_URL env-var override
- .agentception/dispatcher.md: update dispatcher prompt
- .agentception/roles/devops-engineer.md: minor role update
- .gitignore, Dockerfile, docker-compose.yml, README.md: housekeeping
- tests/test_persist_pending_launch_guard.py: regression tests for the
pending_launch guard (queue not drained by poller sweep)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* fix: narrow handle_request return type in MCP plan tests (#74)
handle_request now returns dict[str, object] | None (None for
notifications). The existing tests assumed a non-None dict, causing 24
mypy errors in CI. Add a local _unwrap() helper that asserts the
response is not None and narrows the type, then thread it through all
handle_request call sites in the test file.
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* fix: eliminate all 47 Any patterns — zero typing audit violations (#75)
* fix: eliminate all 47 Any patterns — zero typing audit violations
Replaces every dict[str, Any], list[Any], and # type: ignore with
proper TypedDicts and isinstance narrowing across all 7 offending files.
Category 1 — structured return shapes (39 occurrences):
- db/queries.py: 20 TypedDicts (BoardIssueRow, AgentRunRow, AgentRunDetail,
PipelineTrendRow, IssueDetailRow, PRDetailRow, WaveRow, PendingLaunchRow,
AgentEventRow, AgentThoughtRow, and 10 more). AgentEventRow.payload is
now a raw JSON string; build_ui.py updated to json.loads it on use.
- routes/ui/org_chart.py: 7 TypedDicts (RoleEntry, AnnotatedRoleEntry,
PipelineConfig, OrgPreset, TierEntry, BuilderContext). _read_pipeline_config
parses keys explicitly instead of PipelineConfig(**raw).
Category 2 — # type: ignore on dict narrowing (4 occurrences):
- api_reference.py: 2×assignment — isinstance checks before subscript.
- agents.py: 1×assignment — removed; AgentRunDetail now typed correctly.
- docs.py: 1×return-value — widened return to Response instead of HTMLResponse.
Category 3 — import-untyped (1 occurrence):
- _shared.py: added markdown to pyproject.toml mypy.overrides instead of
# type: ignore[import-untyped].
Category 4 — list[Any]/AsyncIterator[Any] (2 occurrences):
- tests/test_issue_creator.py: TypeGuard narrowers (_is_start, _is_label,
_is_issue, _is_done, _is_error) replace bare Any in _collect helper and
enable typed event discrimination throughout the test suite.
Cascading caller fixes:
- agents.py: 4 new TypedDicts (AgentEnrichedRow, EnrichedAgentRunRow,
BatchRow, RoleGroupRow); all dict[str, object] annotations updated.
- build_ui.py: EnrichedIssueRow + EnrichedPhaseGroupRow; mutating
PhasedIssueRow replaced with explicit TypedDict construction.
- telemetry.py: list[dict[str, object]] → list[PipelineTrendRow].
Result: mypy strict — 0 errors (143 files); typing audit — 0 Any patterns.
* docs: update CI threshold and add DB query TypedDict reference
Update typing-ratchet --max-any from 10 to 0 in ci.md to reflect the
enforced zero-Any ceiling. Add a DB Query TypedDicts section to
type-contracts.md documenting all 20+ named row types introduced during
the Any-elimination refactor.
---------
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
---------
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* Local model integration (#986)
* feat: wire enrich_plan_with_codebase_context into plan_ui.py and issue_creator.py (#912)
- Add import of `enrich_plan_with_codebase_context` to both call sites
- plan_ui.py: call enricher after `PlanSpec.model_validate(parsed)`, before `spec.to_yaml()`
- issue_creator.py: call enricher at top of `file_issues()`, before `repo = _cfg.gh_repo`
- Both call sites use try/except with logger.warning on failure — enrichment is best-effort
- Add integration tests in `agentception/tests/test_plan_enricher_integration.py`:
- test_filed_issue_body_contains_codebase_locations: verifies enriched bodies flow through
- test_enrichment_failure_does_not_block_filing: verifies RuntimeError is swallowed
Closes #870
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: mechanically enforce STOP after pytest exits 0 (#913)
After any run_command call containing "pytest" returns exit_code=0, the
agent loop arms a hard-stop interrupt. On every subsequent iteration,
_PYTEST_STOP_BLOCKED_TOOLS (read_file, search_text, list_directory, etc.)
are intercepted and returned as synthetic errors — the same mechanism as
the loop guard — so the agent cannot enter a post-test audit loop.
The stop is armed per-iteration in extra_system_blocks with _PYTEST_STOP_OVERRIDE
and enforced mechanically in a second interception pass (Pass 2) after the
existing loop guard. It is disarmed automatically when:
- The agent writes new code (file-mutating tools) in a later iteration, because
the new code is untested and must be re-verified.
- A subsequent pytest invocation fails, so the agent can fix the regression.
Two regression tests added to TestPytestHardStop:
- test_pytest_stop_blocks_reads_after_clean_exit: verifies HARD STOP appears
in extra_system_blocks and read_file is never dispatched after pytest passes.
- test_pytest_stop_disarmed_by_file_write: verifies read_file reaches the
real dispatcher once a file write disarms the stop.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: tool surface audit — 13 bugs, dead code, schema mismatches (#914)
Critical
- build_complete_run inputSchema now declares grade and reviewer_feedback
(both optional strings); additionalProperties: False was silently stripping
them, making the reviewer grade pathway unreachable through any MCP client
- build_complete_run now returns isError=not bool(result.get("ok", True))
instead of hardcoded False; every other build_* tool already did this
High
- search_text and find_call_sites: remove --max-count from rg args; add
_truncate_rg_output() which counts only numbered match lines and stops at
n_results total across all files — the previous per-file limit could return
N×num_files lines despite the schema saying "max N total"
- definitions.py: update both n_results descriptions to "total … across all files"
Medium
- _PYTEST_STOP_OVERRIDE and synthetic error message now say run_command
(git add/commit/push) as the commit path, with git_commit_and_push listed
as a secondary option — developer agents don't have git_commit_and_push
- replace_in_file dispatch: allow_multiple coercion widened from
isinstance(allow_raw, bool) to isinstance(allow_raw, (bool, int)) so an
integer 1 from the model is not silently treated as False
Dead code (no-legacy rule)
- _READ_ONLY_TOOL_NAMES = frozenset() legacy alias deleted from agent_loop.py
- build_spawn_child_run (106-line orphaned function) deleted from
build_commands.py; its spawn_child/SpawnChildError/Tier/ScopeType imports
removed; module docstring updated; runs.py and docs/guides/mcp.md updated
to reference the live build_spawn_adhoc_child MCP tool
- plan_get_labels, plan_get_cognitive_figures dead imports removed from server.py
- "create_directory" ghost entry removed from _KEY_ARG log-hint dict
Style / non-idiomatic
- _CLASS_DEF_RE moved from inside insert_after_in_file body to module-level
constant (was re-compiled on every call)
- f-string without format args removed in shell_tools.py
- search_codebase collection description: "Omit (or leave null)" → "Omit"
(type is "string" with no null union; "leave null" could cause API rejection)
- read_file docstring: path resolution note updated to reflect dispatcher reality
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: address three deferred tool-surface audit items
_GUARD_PERMITTED_TOOL_NAMES: add run_command and git_commit_and_push to the
loop-guard allowlist. Without them a guarded agent writes code it can't verify
(no mypy/pytest) and can't deliver (no git push), trapping it in an infinite
guard loop. Regression guard: test_guard_allowlist_includes_shell_tools asserts
both tools are present so the allowlist can never silently regress.
find_call_sites regex: expand the pattern to cover from-import lines
(from x import symbol) and type-annotation contexts (symbol: / symbol[).
Previously only call sites (symbol() and bare import lines were matched.
read_symbol heuristic: document that the indentation-based fallback path does
NOT work for brace-delimited languages (TypeScript, JavaScript). If a TS agent
role is added the heuristic must be replaced with a tree-sitter extractor.
* feat: suppress implementing/reviewing status badge in active-lane cards (#916)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add Plan enrichment section to docs/reference/api.md (#872)
* fix: two reviewer merge bugs — self-approval failure and unmerged-PR teardown
Bug 1: reviewer prompt instructed pull_request_review_write(event="APPROVE")
before merge_pull_request. GitHub rejects self-reviews (author == reviewer
same account) with 422, wasting an iteration and confusing the model into
giving up on the merge. Fix: remove the APPROVE step entirely from the
grade-A flow. The merge stands alone; GitHub doesn't require an approved
review to merge unless branch protection rules enforce it.
Bug 2: when merge_pull_request failed (branch behind dev), the reviewer
called build_complete_run(grade="A") without merging. build_complete_run
responded by scheduling teardown_agent_worktree, which deleted the remote
branch, which caused GitHub to auto-close the unmerged PR.
Fix: build_complete_run now calls _is_pr_merged (GitHub REST API 204 check)
before scheduling teardown for reviewer grade-A/B completions. If the PR
is not yet merged the call returns an error, forcing the reviewer to call
merge_pull_request first.
Also strengthens the "branch behind dev" section in reviewer.md.j2 to
make the mandatory rebase+retry path explicit and unambiguous.
Regression tests:
- test_build_complete_run_blocks_grade_a_when_pr_not_merged
- test_build_complete_run_allows_grade_a_when_pr_merged
* refactor(scss): split _build.scss into per-feature partials (#873)
_build.scss was 2069 lines. Every frontend ticket appended to it, causing
rebase conflicts between parallel agents. Extracted five component-scoped
partials and one layout partial; replaced _build.scss with a @use barrel.
New files:
_inspector-layout.scss — all inspector/build layout rules (lines 1-1892)
_thought-block.scss — .thought-block (58 lines)
_file-edit-card.scss — .file-edit-card + .diff-add/remove/context (32 lines)
_assistant-bubble.scss — .assistant-bubble (12 lines)
_tool-call-card.scss — .tool-call-card (42 lines)
_event-card.scss — .event-card (27 lines)
Compiled app.css is byte-identical to pre-change artifact:
sha256: 17e18481c378787f472aee951c3a4ce218f2e9d5b06721b0570779156e50d875
.gitattributes: merge=union added for all six new partials and the barrel.
* feat: convert agentception/db/queries.py into a package (#874) (#921)
Move queries.py → queries/_monolith.py unchanged (no query logic modified).
Create queries/types.py with all 46 TypedDict definitions extracted from the
monolith. Wire queries/__init__.py with explicit re-exports (import X as X)
so that every existing caller continues to work without change.
mypy passes on all 278 files; zero new test failures introduced.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: split query monolith into 5 domain submodules (#876) (#922)
Create agentception/db/queries/{board,runs,messages,events,metrics}.py.
Each file imports TypedDicts from agentception.db.queries.types (not
from the monolith), carries only functions (no TypedDict definitions),
and is importable in isolation.
_monolith.py and __init__.py are unchanged — callers are not affected.
The next issue will wire __init__.py re-exports to these files and
delete _monolith.py.
mypy passes on all 283 source files; zero new test failures introduced.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: wire queries/__init__.py from domain submodules, delete _monolith.py (#875) (#923)
Replace the monolith re-export in __init__.py with explicit per-domain
imports (board, runs, messages, events, metrics, types). All 102 public
and test-required private symbols are re-exported via the `X as X` pattern
so every existing `from agentception.db.queries import X` call-site
continues to work without modification.
Delete _monolith.py — the 3250-line file is fully replaced by the six
focused submodules.
Add merge=union .gitattributes entries for all six new domain files.
mypy passes on 282 source files; zero new test failures.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: audit imports, fix broken tests, document new module structure (#877) (#924)
Audit: zero direct-path queries.py imports in the test suite (all three
grep checks pass clean).
Fix 4 previously broken tests by updating stale mock.patch targets to
point at the domain submodule where each dependency is locally bound:
- test_get_daily_metrics_returns_zeros_on_db_error:
agentception.db.queries.get_session
→ agentception.db.queries.metrics.get_session
- test_get_issues_grouped_by_phase_* (3 tests):
agentception.db.queries.get_session
→ agentception.db.queries.board.get_session
agentception.db.queries.get_initiative_phase_meta
→ agentception.db.queries.board.get_initiative_phase_meta
Docs: add "## Query module structure" and "## SCSS partial structure"
sections to docs/architecture.md describing the six query submodules
and six SCSS partials, their domain ownership, and the merge=union
.gitattributes strategy.
mypy clean on 282 files; 104 tests pass with zero failures.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: log per-turn token counts and track last_input_tokens in agent loop (#925)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: token-aware _prune_history with last_input_tokens budget guard (#926)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: add missing recorded_at to ACAgentEvent inserts in persist.py (#927)
Two ACAgentEvent creation sites were missing the required recorded_at
field — complete_agent_run (build_complete_run event) and the orphan
sweep (orphan_failed event). The NOT NULL constraint caused a DB
exception that rolled back the whole transaction, so the agent run
status was never updated from implementing to completed/done.
The result was that every successfully-completed agent run left its DB
row stuck on status=implementing despite the PR being merged. The
orphan sweep then had the same bug but was harder to notice because
it fires infrequently.
Fix: pass recorded_at=_now() (complete_agent_run) and recorded_at=now
(orphan sweep, which already has `now` in scope).
Also manually corrected issue-884's stuck row to status=done.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: replace dict[str, Any] with DailyMetrics TypedDict in test_metrics_api.py (#928)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: inject _CONTEXT_PRESSURE_WARNING into extra_blocks when input tokens exceed threshold (#929)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: fetch origin/dev before creating developer worktree (#931)
Developer dispatches set worktree_base = "origin/dev" without first
fetching, so the container's ref tracker could lag behind GitHub by one
or more merged PRs. The worktree was created from a stale tip, and the
agent's first push immediately diverged from origin/dev — causing an
unnecessary rebase conflict on every back-to-back dispatch.
Reviewers and continuations already fetched their branch before creating
the worktree (lines 718-724). Mirror that pattern for developers in the
else-branch: run git fetch origin dev before setting worktree_base.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: add _summarise_history and context checkpoint injection to _prune_history (#930)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: update stale patch paths and add _is_pr_merged mock; add context window management docs (#932)
- Update patch targets in test_phase_grouping.py, test_get_initiatives.py,
and test_persist_regression.py from agentception.db.queries.get_session
to agentception.db.queries.board.get_session (post-#876 module split)
- Add _is_pr_merged mock to test_build_complete_run_reviewer_does_not_redispatch_reviewer
so the test does not make a live GitHub API call
- Add ## Context window management section to docs/architecture.md
Closes #888
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: add prompt_variant to ACAgentRun, AgentTaskSpec, and Alembic migration 0011 (#933)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: orphan sweep grace period to avoid race with dispatch (#934)
Do not mark a run failed when last_activity_at is within 60s. The poller
builds live_ids from list_active_runs() at tick start; dispatch can commit
acknowledge_agent_run after that, so the run is in the DB as implementing
but not yet in live_ids. Without the grace period the orphan sweep would
immediately mark it failed.
- Add _ORPHAN_GRACE_SECONDS and skip orphan when last_activity_at recent
- Add test_orphan_grace_period_skips_recently_acknowledged_run
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: variant-aware _load_role_prompt and prompt_variant dispatch param (#935)
* feat: variant-aware _load_role_prompt and prompt_variant dispatch param
* fix: add Any type parameter to captured_kwargs in test_dispatch_variant
---------
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: remove Any from test_dispatch_variant (typing ratchet) (#937)
Use list[dict[str, str | int | None | bool]] and typed **kwargs
instead of list[dict[str, Any]]. Satisfies zero-Any rule.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: add activity_events module with persist helper and TypedDict payload shapes (#945)
* feat: add activity_events module with persist helper and TypedDict payload shapes
* fix: correct db_session fixture return type to Generator[Session, None, None]
---------
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add planning prompts and tight-ticket standards guide (#946)
- What human prompts 1A receives (brain dump) and what 1A/1B produce
- PlanSpec issue body format (seven sections, order)
- Informal rules for brain dumps that yield agent-ready tickets:
one deliverable, cap read-heavy work, specific locations, clear done
criteria, minimal phases, format when needed, doc-only vs implement
- References to llm_phase_planner, issue_creator, plan-spec
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* tools: steer agent to search_text over run_command(grep) for code search (#947)
- search_text: add 'PREFER this over run_command(grep/ripgrep) for
searching the codebase' so the model uses the dedicated ripgrep tool
instead of shelling out to grep.
- run_command: add 'For searching the codebase for a string or regex,
use search_text instead of grep.' to avoid defaulting to grep.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: clear worktree_path in DB after reaper releases worktree (#948)
Reaper was re-finding the same terminal runs every pass because we never
cleared worktree_path after release_worktree(). Now we call
clear_run_worktree_path(run_id) only when release_worktree returns True,
so the next reaper pass (or startup) does not re-process the same runs.
release_worktree now returns bool for success/failure.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: agent_loop path/lines + search_text logging, watch_run display, block grep in run_command (#949)
- agent_loop: log path + line range for read_file_lines, pattern + directory for search_text
- watch_run: parse dispatch_tool args and show read path lines X–Y and search_text pattern/dir
- shell_tools: block run_command(grep), direct agent to search_text (ripgrep, .gitignore-aware)
- test_shell_tools: test_grep_blocked_use_search_text
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add watch-run-log-map.md mapping every log pattern to emission site (#950)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: preemptive nudge before loop guard; only dispatch reviewer after worktree release (#951)
- agent_loop: inject ONE TURN BEFORE READ-ONLY LOCK nudge when
iterations_since_write == threshold-1 so model calls write in that response
- build_commands: only call auto_dispatch_reviewer when release_worktree
returns True; return error to agent when release fails
- tests: test_preemptive_nudge_injected_one_turn_before_loop_guard,
test_implementer_completion_fails_when_release_worktree_returns_false
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: clear run events/messages on dispatch; dedupe read_file_lines in watch_run (#952)
- persist: delete ACAgentEvent and ACAgentMessage for run_id at start of
persist_agent_run_dispatch so re-dispatches show a clean timeline
- watch_run: do not render dispatch_tool line for read_file_lines; show
only the file_tools result line (path + lines + total) to avoid duplicate
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(activity-events): emit tool_invoked, delay, llm_* from agent_loop (#940) (#953)
- agent_loop: persist tool_invoked after dispatch_tool log; persist delay in
_enforce_turn_delay; thread session/run_id through loop, _dispatch_tool_calls,
_dispatch_single_tool (session optional for debug_loop).
- llm: add session, run_id, iteration to call_anthropic_with_tools; persist
llm_iter, llm_usage, llm_reply, llm_done after existing logger calls.
- activity_events: persist_activity_event accepts AsyncSession (caller flushes).
- tests: test_agent_loop_activity_events.py (tool_invoked, delay); update
TestEnforceTurnDelay and TestDispatchToolCalls for new session/run_id args.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix(agent_loop): continue when stop_reason=tool_calls with empty list (#954)
Anthropic can return stop_reason='tool_calls' with tool_calls=[] (e.g.
truncation or API quirk). We previously fell through to 'unexpected' and
cancelled the run. Now we inject a nudge and continue the loop instead of
cancelling.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* chore: ship resync UX, poll docs, and dispatch script (#955)
- Mission Control: Reload button spins in place (no separate loading spinner),
triggers immediate board refresh via HX-Trigger after resync. Button is its
own HTMX indicator; aria-label/title 'Refresh from GitHub'.
- resync.py: Return HX-Trigger: refreshBoard on HTMX success so board refetches
immediately after resync.
- .env.example + docs/reference/poller.md: Document 5s poll default (safe for
GitHub rate limits); align example and poller doc with config default.
- tests: test_build_page_structure resync assertion; test_build_ui aria-label.
- scripts/dispatch_issue_941.py: One-off script to dispatch issue 941 via
POST /api/dispatch/issue (fetch issue from GitHub, then POST).
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: emit activity events from file_tools (read, replace, insert, write) (#956)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix(build_complete_run): release derived worktree path when DB has none (#957)
When get_agent_run_teardown returns None or worktree_path is null we skipped
release_worktree but still dispatched the reviewer, causing 'branch already
used by worktree' on reviewer dispatch. Now we derive the path as
worktrees_dir/run_id and attempt release before dispatching; only run rebase
when the path exists. Tests updated to expect release_worktree with derived path.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(942): emit activity events from shell_tools, git helper, and GitHub MCP (#958)
- run_command: emit shell_start (cmd_preview, cwd) and shell_done (exit_code, stdout_bytes, stderr_bytes)
- git_commit_and_push: emit git_push (branch) after successful push
- agent_loop: emit github_tool (tool_name, arg_preview) when dispatching GitHub MCP tools
- Optional run_id/session on run_command and git_commit_and_push; persist only when both set
- Wrap all persist calls in try/except so DB failures never propagate
- Add tests/tools/test_shell_github_activity_events.py (shell_start/done, github_tool, persist failure, git_push)
- Fix mypy: test_build_commands_rebase await_args can be None
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(943): extend inspector SSE to stream all activity events in chronological order (#959)
- Add get_all_events_tail(run_id, after_id) in db/queries/events.py; get_agent_events_tail delegates to it
- _inspector_sse: use get_all_events_tail, merge events and thoughts by recorded_at, emit in order
- Emit activity events as {"t": "activity", "subtype", "payload", "recorded_at", "id"}; other events keep {"t": "event", ...} with id
- Add tests/routes/test_inspector_sse.py: activity events in stream, cursor advances, ordered by id
- Update existing tests to patch get_all_events_tail
- Document GET /ship/runs/{run_id}/stream and activity subtypes in docs/reference/api.md
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(944): consume SSE activity messages and append live DOM rows to inspector feed (#960)
- Add activity_feed.ts: ActivityMessage type, formatActivitySummary (per-subtype text), appendActivityRow (div.activity-feed__row with data-subtype, icon placeholder, summary, time), attachActivityFeedHandler
- Wire attachActivityFeedHandler in build.ts _openStream
- Add _activity-feed.scss for .activity-feed__row, __icon, __summary, __ts
- Unit tests: formatActivitySummary, appendActivityRow, attachActivityFeedHandler
- All payload text via textContent/setAttribute; no innerHTML
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: variant-aware _load_role_prompt and prompt_variant dispatch param (fixes mypy) (#936)
* feat: variant-aware _load_role_prompt and prompt_variant dispatch param
* fix: type annotations in test_dispatch_variant (mypy)
Use list[dict[str, str | int | None | bool]] and typed **kwargs
instead of bare list[dict]. No Any.
---------
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(891): add developer-streamlined prompt variant for A/B testing (#961)
- Add scripts/gen_prompts/templates/roles/developer-streamlined.md.j2 (slimmed variant:
'item' not 'AC item', Step 3 = Ship and open PR combined; mypy + tests + Hard rules kept)
- Run generate.py → .agentception/roles/developer-streamlined.md
- Add tests/test_developer_streamlined_prompt.py (exists, contains mypy, excludes AC item/code smell)
- Default developer.md unchanged; variant used only when prompt_variant=streamlined
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(892): add GET /api/metrics/ab for per-variant A/B metrics (#962)
- New read-only endpoint: GET /api/metrics/ab?days=N (default 7, clamp 1–90)
- ABVariantMetrics: variant (COALESCE prompt_variant, 'control'), role, runs,
avg_iterations, avg_input_tokens, total_tokens, pass_rate, passed, failed
- Query joins agent_runs to agent_events (step_start count, done event grade)
- Returns 200 with empty variants: [] when no data or on DB error
- Register ab_metrics router in api __init__
- Tests: response shape (control + streamlined), empty DB, days param, 422 for days=0
- Docs: GET /api/metrics/ab in api.md with query params and response schema
- No change to dispatch, prompts, or existing routes
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(893): add HTMX A/B metrics panel to build page (#963)
- build.html: polling div #ab-metrics-panel with hx-get=/api/metrics/ab, every 30s
- partials/_ab_metrics.html: table.ab-metrics (Variant, Runs, Avg Iters, Pass Rate, Avg Tokens)
- ab_metrics.py: when HX-Request: true return HTML partial, else JSON (response_model=None)
- _ab_metrics.scss: table styles + .loading-placeholder; import in pages/_build.scss
- test_ab_metrics_panel.py: polling div in build.html, HTMX returns HTML, JSON unaffected
- api.md: A/B Metrics Panel note (HX-Request returns HTML)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs(964): add local LLM MLX guide for Qwen 3.5 35B on Apple Silicon (#968)
- docs/guides/local-llm-mlx.md: install (mlx-lm / mlx-vlm), model choice
(mlx-community/Qwen3.5-35B-A3B-4bit), run options (generate, chat, server),
powermetrics/Activity Monitor for CPU/GPU/ANE, checklist for #964
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(local-llm): full pipeline for local Qwen 3.5 35B, doc and flow
- Config: use_local_llm, local_llm_base_url, chat path, model, context caps
- LLM: call_local_with_tools (same contract as Anthropic), call_local_completion
- Agent loop: developer uses local LLM with full system prompt + cognitive arch;
only endpoint and context caps differ; comment and doc clarify same pipeline
- API: GET /api/local-llm/hello, POST /api/local-llm/hello-agent
- Docker: extra_hosts for host.docker.internal
- Watch script: local LLM indicator in ITER line, heartbeat shows local vs LLM
- Docs: local-llm-mlx.md — 48 GB runbook, mlx-openai-server, mlx-vlm >= 0.3.12,
two-step install, torch/torchvision for processor, same pipeline as Anthropic
* feat: plan-scoped integration branch (#974)
- Create plan branch on first dispatch (from origin/dev); reuse for later issues.
- Persist plan_id and plan_branch on runs; persist plan_issues when file_issues completes.
- Use plan branch as worktree base and PR base when plan_id is set; inject PR base into briefing.
- Rebase implementer branch onto plan branch (not dev) before dispatching reviewer when plan-scoped.
- When last issue in plan is merged into plan branch: rebase plan onto dev, open plan→dev PR, dispatch reviewer.
New tables: plan_issues, plan_branches. Migration 0012.
Docs: architecture/plan-scoped-integration-branch.md updated; link from architecture.md.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add LLM provider abstraction architecture doc (#975)
- Add docs/architecture/llm-provider-abstraction.md (planning status).
- Link from docs/architecture.md Further Reading.
- Remove unneeded scripts/curl_local_plan.sh (was untracked).
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(llm): add provider-agnostic public API (step 1)
Add completion(), completion_stream(), completion_with_tools() as the
single contract. They delegate to call_anthropic* / call_local_with_tools
(use_local_llm for tools). No caller changes yet; next step will switch
plan_ui, llm_phase_planner, agent_loop to use these entry points.
* fix(agent_loop): init tc_to_dispatch/tool_results each iteration; test: mock get_session and use_local_llm
- Initialize tc_to_dispatch and tool_results at start of each loop iteration
so bookkeeping (pytest arm, etc.) never hits UnboundLocalError when
stop_reason is 'length' or 'tool_calls' with empty tool_calls.
- Tests: add _mock_get_session() and patch get_session in all run_agent_loop
tests so they run without init_db(); set use_local_llm=False on mock
settings; add session.flush = AsyncMock() so dispatch path is awaitable.
* feat(llm): wire callers to provider-agnostic API (phase two)
- llm_phase_planner: use completion() instead of call_anthropic
- plan_ui: use completion_stream() instead of call_anthropic_stream
- agent_loop: use completion() for recon and _summarise_history;
use completion_with_tools() for main loop (replaces call_local_with_tools
and call_anthropic_with_tools). Provider selection remains in llm layer.
- Tests: patch completion/completion_with_tools/completion_stream at
call sites; plan_ui tests patch completion_stream; agent_loop tests
patch completion and completion_with_tools. LLM retry tests still
exercise call_anthropic_with_tools in llm module.
- Docstring: llm_phase_planner describes completion() as entry point.
* fix(typing): resolve typing audit — TypedDict payloads, no type: ignore
- activity_events: convert FileReadPayload, FileReplacedPayload,
FileInsertedPayload, FileWrittenPayload from dict subclass to TypedDict
so dict literals type-check. Accept Mapping[str, object] in
persist_activity_event so TypedDict payloads are accepted.
- file_tools: restore named payload types and remove all four
type: ignore[assignment]. _emit_activity accepts Mapping[str, object].
Typing audit: 0 Any, 0 type_ignore; mypy clean.
* feat(llm): phase three — LLM_PROVIDER config and single adapter selection
- Add LLMProviderChoice (anthropic | local) and llm_provider config (LLM_PROVIDER env).
- effective_llm_provider: USE_LOCAL_LLM=true overrides to local for backward compat.
- completion(), completion_stream(), completion_with_tools() branch only on
settings.effective_llm_provider; local completion_stream yields one content chunk.
- agent_loop and local_llm route use effective_llm_provider instead of use_local_llm.
- Tests: config effective_llm_provider + LLM_PROVIDER parsing; llm provider selection.
- Doc: llm-provider-abstraction step 4 marked done.
* feat(llm): phase four — local adapter behind contract
- Content normalization: _normalize_openai_message_content() handles
message.content as string or list of parts; strips reasoning, concatenates
text for final answer. Used in completion and tool-use response parsing.
- Local adapter helpers: _local_base_url(), _local_chat_url(),
_local_completion_payload(); call_local_completion() accepts temperature.
- True streaming: _local_completion_stream() POSTs with stream=true, parses
SSE, maps delta.content / delta.reasoning_content to LLMChunk; on failure
or unsupported server falls back to one-shot and yields one content chunk.
- Public completion_stream() uses _local_completion_stream for local provider.
- /api/local-llm/hello uses public completion() instead of call_local_completion.
- Tests: normalize (string, list, reasoning stripped, empty); stream fallback.
- Doc: step 3 (Local adapter) marked done; status Phases 1–4 implemented.
* docs(llm): phase five — LLM contract, deployment guide, cross-doc updates
- Add docs/reference/llm-contract.md: contract (completion, completion_stream,
completion_with_tools), types, provider selection (LLM_PROVIDER,
effective_llm_provider), adapters, step-by-step how to add a provider.
- Rewrite docs/guides/local-llm-mlx.md: Config and environment section with
full table (LLM_PROVIDER, USE_LOCAL_LLM, all LOCAL_LLM_*); How the local
adapter works (normalization, streaming, fallback); update all integration
steps to mention LLM_PROVIDER and effective provider; add LLM contract ref.
- docs/reference/type-contracts.md: document public API and provider-agnostic
contract; link to llm-contract.md; update diagrams/tree to completion_*.
- docs/guides/setup.md: add LLM_PROVIDER, USE_LOCAL_LLM to .env table; add
Local LLM subsection with pointer to local-llm-mlx and llm-contract.
- docs/guides/security.md: LLM section covers Anthropic and local provider;
link to llm-contract and local-llm-mlx.
- docs/architecture.md: llm.py described as provider-agnostic; completion_*
and config; context window note uses completion_with_tools().
- docs/architecture/llm-provider-abstraction.md: mark step 6 (Document and
test) done; status Phases 1–5 implemented; add pointers to new/updated docs.
- docs/README.md: add Local LLM with MLX guide; add LLM Contract reference;
system overview and directory structure use completion_* and provider config.
* docs: clarify OpenAI-compatible = wire format, not OpenAI cloud
- local-llm-mlx: Naming section (Chat Completions API, mlx-openai-server
package name vs local inference); Option C retitled; mlx_lm.server /
mlx-openai-server called out as local; 48B runbook link text.
- llm-contract: Local adapter row + sentence on wire format vs vendor.
- setup: LLM_PROVIDER row notes local server, not OpenAI cloud.
* fix(local-llm): cap max_tokens for mlx-openai-server (422)
mlx-openai-server rejects max_tokens > 4096 with HTTP 422. Plan 1A and
streaming used 8192/16k; clamp every local chat payload to
local_llm_completion_token_ceiling (default 4096).
- config: LOCAL_LLM_COMPLETION_TOKEN_CEILING
- llm: _local_cap_max_tokens, call_local_with_tools + _local_completion_payload
- tests: cap + payload + config default
- docs: 422 cause, llm-contract generation budget
* feat(llm): normalize think-tags, fix yaml-parse fallback, decisiveness prompt
- Add _normalize_think_tags() generator in services/llm.py to reclassify
<think>...</think>-wrapped content as type="thinking" chunks; ensures
plan_ui always receives clean thinking/content separation regardless of
whether the backend uses reasoning_content fields or inline tags (Qwen3)
- Add repetition_penalty=1.1 and frequency_penalty=0.3 to local completion
payload to discourage degenerate generation loops
- Remove client-side repetition detector from plan_ui (_REPEAT_WINDOW /
_REPEAT_MIN_LEN) — model-level penalties are the correct layer; the
detector caused false positives on structured YAML with repeated section
headers and prematurely broke valid streams
- Wrap YAML safe_load + PlanSpec.model_validate in plan_ui in its own
try/except so malformed LLM output always falls back to the clarify plan
rather than emitting a stream-level error event
- Rewrite _IDENTITY and _YAML_SYSTEM_PROMPT preamble to be more decisive
and direct, reducing agent "overthinking" and re-planning loops
- Reduce streaming read timeout from 300s to 90s and add specific
httpx.ReadTimeout logging to detect mlx-openai-server stalls faster
- Add 5 unit tests for _normalize_think_tags covering basic split,
no-tags passthrough, already-classified chunks, cross-chunk tags,
and multiline Qwen-style output
- Update test suite: rename/update integration tests to reflect removal
of repetition detection and addition of think-tag normalization
* feat(local-llm): Ollama as primary backend, per-usecase routing, LiteLLM proxy guide
- Remove repetition_penalty from _local_completion_payload — not an
OpenAI-standard parameter; causes silent incompatibility with Ollama's
OpenAI-compat endpoint. frequency_penalty is standard and stays.
- Raise LOCAL_LLM_COMPLETION_TOKEN_CEILING default 4096 → 8192; the old
value was a mlx-openai-server workaround (422 above 4096); Ollama
supports full context lengths. Update test assertion accordingly.
- Add per-usecase model/URL overrides in config.py:
LOCAL_LLM_BASE_URL_PLAN / LOCAL_LLM_MODEL_PLAN for completion_stream()
LOCAL_LLM_BASE_URL_AGENT / LOCAL_LLM_MODEL_AGENT for completion_with_tools()
Properties effective_local_base_url_plan/agent and
effective_local_model_plan/agent fall back to the global values when unset.
Enables Phase 3 two-model routing through LiteLLM Proxy.
- Wire _local_completion_stream to use effective_local_base_url_plan and
effective_local_model_plan; wire call_local_with_tools to use
effective_local_base_url_agent and effective_local_model_agent.
- Update .env: raise LOCAL_LLM_COMPLETION_TOKEN_CEILING to 8192.
- Rewrite docs/guides/local-llm-mlx.md: make Ollama the primary
recommendation with full install/pull/serve runbook; demote
mlx-openai-server to a developer footnote. Add per-usecase config
table and link to the new scaling guide.
- Add docs/guides/local-llm-scaling.md: four-phase local LLM scaling
architecture (single Ollama → LiteLLM Proxy → two models → multi-machine),
litellm-config.yaml, docker-compose snippet, monitoring and
troubleshooting sections.
- Add two new tests for per-usecase config: fallback and override paths.
* fix(ollama): disable think mode for non-streaming calls; forward all LOCAL_LLM_* env vars
Qwen 3.5 with Ollama's thinking mode enabled spends all available tokens
on chain-of-thought reasoning before writing content. With a small token
budget (e.g. max_tokens=128 in /hello), the model exhausts its budget
during thinking and returns empty content.
Fix 1 — add think parameter to _local_completion_payload:
- think=False (default): sends "think": false to Ollama so the model
outputs the answer directly into content, skipping CoT. Used by
call_local_completion (hello, one-shot) and call_local_with_tools
(agent turns) where latency matters more than reasoning depth.
- think=True: used only by _local_completion_stream (Phase 1A planning)
where CoT quality is the priority and tokens are plentiful.
Ignored by backends that do not recognise the field (vLLM, mlx, etc).
Fix 2 — forward all LOCAL_LLM_* env vars in docker-compose.yml:
Previously only USE_LOCAL_LLM and LOCAL_LLM_BASE_URL were forwarded,
so LOCAL_LLM_MODEL, LOCAL_LLM_COMPLETION_TOKEN_CEILING, the per-usecase
overrides, and all other local LLM config were invisible inside the
container. docker compose restart therefore never picked up .env changes
for those vars. Now all LOCAL_LLM_* fields are explicitly forwarded with
sensible defaults matching config.py.
* fix(ollama): raise token budget for non-streaming calls; warn on empty content
Qwen 3.5 with Ollama sends thinking in the reasoning field and the actual
answer in content. When max_tokens is too small (e.g. 128 in /hello),
the model exhausts its budget during chain-of-thought and content is
empty. think: false is sent in the payload for future Ollama support but
is currently ignored.
- Remove max_tokens=128 override in /hello endpoint; completion() already
defaults to max_tokens=4096 which gives the model room to think and
then answer
- Add warning log in _normalize_openai_message_content when content is
empty but reasoning is non-empty, pointing at the ceiling config var
- Update think param docstring to reflect current Ollama behaviour
* feat: hide A/B metrics panel via HTML comment (#984)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
---------
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: comprehensive audit sweep — remove transient files, fix stale content (#1009)
Deletes:
- HACKATHON.md (session artifact, broken links, stale Python 3.11 ref)
- .agentception/verification-761.md (test run report, not a repo artifact)
- scripts/dispatch_issue_941.py (one-off script, docstring says so)
- agentception/EXTRACT.md (monorepo extraction is complete; procedure irreversible)
- docs/migration.md (migration to standalone DB is done; all checklist items past tense)
- agentception/README.md (described Cursor-only zero-LLM-calls arch that no longer exists)
Updates:
- .env.example: replace USE_LOCAL_LLM/mlx_lm.server/:8080 with LLM_PROVIDER/Ollama/:11434
- docs/guides/setup.md: Python 3.11→3.12, container agentception-app→agentception,
MLX→Ollama in local LLM blurb, drop USE_LOCAL_LLM legacy row from env var table
- docs/guides/security.md: port 8080→11434 in local provider example
- docs/reference/llm-contract.md: token ceiling default 4096→8192 (Ollama); note
that 4096 limit is specific to mlx-openai-server
- CHANGELOG.md: correct "AC_ prefix applied to all config keys" — only 4 vars use it
- docs/README.md: "all AC_* env vars" → "environment-variable-driven config"
- docs/cursor-agent-spawning.md: add accuracy note (primary dispatch path is now
agent_loop.py, not Cursor Task tool); fix broken links to missing stress-test file
- docs/guides/contributing.md: add callout distinguishing external-contributor
draft-PR workflow from internal merge-immediately policy (AGENTS.md)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* release: dev → main (#1033)
* docs: overhaul MCP reference with Mermaid diagrams and typed tool schemas (#1030)
Rewrite docs/reference/mcp.md as the single authoritative MCP reference:
- Add 4 Mermaid diagrams (architecture, tool dispatch, resource resolution, agent role surfaces)
- Add fully typed input/output schemas with example JSON-RPC for all 12 tools
- Generalize language from Cursor-specific to client-agnostic
Delete the mcp-tools.md stub (17 lines pointing to mcp.md).
Clean up docs/guides/mcp.md:
- Remove duplicate stdio config block (lines 77-97 were copy of 55-75)
- Generalize remaining Cursor-specific language (keep Cursor as example client)
- Fix stale tool name reference (build_spawn_child -> build_spawn_adhoc_child)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* security: harden agent runtime — path sandbox, secret redaction, denylist, prompt injection guardrails, wall-clock timeout
Five concrete hardening measures from the MCP/agent runtime security audit:
1. File-tool path sandbox (_is_safe_read_path / _is_safe_write_path in
agent_loop.py): reads restricted to worktree + repo root; writes
restricted to worktree only. Symlinks resolved before check. All 10
file/directory tools enforced.
2. Shell command secret redaction (_redact_secrets in shell_tools.py):
all stdout/stderr from run_command is stripped of ANTHROPIC_API_KEY,
GITHUB_TOKEN, DATABASE_URL, AC_API_KEY, ghp_ PATs, sk-ant- keys, and
Bearer tokens before the output reaches the agent or the DB.
3. Expanded shell denylist: adds rm -rf /app, rm -rf /worktrees, nc -e,
/dev/tcp/, and /dev/udp/ to the existing _BLOCKED_PATTERNS.
4. Prompt injection security contract (_RUNTIME_ENV_NOTE in agent_loop.py):
every agent system prompt now includes an explicit instruction to treat
all repository content as untrusted external data, never to exfiltrate
credentials, and never to make unauthorized outbound HTTP requests.
5. Wall-clock timeout (asyncio.timeout in agent_loop.py): wraps the entire
agent loop with AGENT_MAX_WALL_SECONDS (default 7200 s / 2 h); timeout
cancels the run gracefully and transitions it to 'cancelled'.
Docker: adds no-new-privileges, cap_drop ALL, and minimal cap_add to
docker-compose.yml.
Docs: security.md rewritten to document all five hardening layers plus
the threat model update.
Tests: 131 pass (mypy clean, zero Any).
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* security: non-root container user, narrow bind mounts, egress allowlist proxy (phase 2)
Three additional hardening measures completing the phase-2 security audit:
1. Non-root process user (Dockerfile + scripts/entrypoint.sh)
- Creates agentception user UID/GID 1001 in Dockerfile
- Installs gosu (purpose-built privilege-drop helper) alongside git/curl/ripgrep
- scripts/entrypoint.sh performs a two-phase startup:
* Root phase: write /etc/resolv.conf, compile SCSS/JS, run Alembic
migrations, chown /worktrees and model cache to agentception:agentception
* Unprivileged phase: exec gosu agentception "$@" → uvicorn runs as PID 1
with UID 1001, zero capabilities to acquire additional privs
- git system config adds user.email/user.name and http.proxy for worktree ops
- Verified: /proc/1/status shows Uid=1001 in the running container
2. Narrow bind mounts (docker-compose.yml)
- Replaces ./:/app with explicit per-directory mounts: agentception/,
.agentception/, pyproject.toml, org-presets.yaml, scripts/, tests/, tools/
- .env, docker-compose*.yml, .git/, Dockerfile, and all other sensitive or
unnecessary files are intentionally excluded
- Verified: /app/.env is absent inside the running container
- docker-compose.ci.yml updated with matching narrow mounts + no-op proxy
service for CI compatibility
3. Egress allowlist proxy (scripts/tinyproxy/)
- scripts/tinyproxy/Dockerfile.proxy: builds a tinyproxy image from
d…
* docs: overhaul MCP reference with Mermaid diagrams and typed tool schemas (#1030)
Rewrite docs/reference/mcp.md as the single authoritative MCP reference:
- Add 4 Mermaid diagrams (architecture, tool dispatch, resource resolution, agent role surfaces)
- Add fully typed input/output schemas with example JSON-RPC for all 12 tools
- Generalize language from Cursor-specific to client-agnostic
Delete the mcp-tools.md stub (17 lines pointing to mcp.md).
Clean up docs/guides/mcp.md:
- Remove duplicate stdio config block (lines 77-97 were copy of 55-75)
- Generalize remaining Cursor-specific language (keep Cursor as example client)
- Fix stale tool name reference (build_spawn_child -> build_spawn_adhoc_child)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* security: harden agent runtime — path sandbox, secret redaction, denylist, prompt injection guardrails, wall-clock timeout
Five concrete hardening measures from the MCP/agent runtime security audit:
1. File-tool path sandbox (_is_safe_read_path / _is_safe_write_path in
agent_loop.py): reads restricted to worktree + repo root; writes
restricted to worktree only. Symlinks resolved before check. All 10
file/directory tools enforced.
2. Shell command secret redaction (_redact_secrets in shell_tools.py):
all stdout/stderr from run_command is stripped of ANTHROPIC_API_KEY,
GITHUB_TOKEN, DATABASE_URL, AC_API_KEY, ghp_ PATs, sk-ant- keys, and
Bearer tokens before the output reaches the agent or the DB.
3. Expanded shell denylist: adds rm -rf /app, rm -rf /worktrees, nc -e,
/dev/tcp/, and /dev/udp/ to the existing _BLOCKED_PATTERNS.
4. Prompt injection security contract (_RUNTIME_ENV_NOTE in agent_loop.py):
every agent system prompt now includes an explicit instruction to treat
all repository content as untrusted external data, never to exfiltrate
credentials, and never to make unauthorized outbound HTTP requests.
5. Wall-clock timeout (asyncio.timeout in agent_loop.py): wraps the entire
agent loop with AGENT_MAX_WALL_SECONDS (default 7200 s / 2 h); timeout
cancels the run gracefully and transitions it to 'cancelled'.
Docker: adds no-new-privileges, cap_drop ALL, and minimal cap_add to
docker-compose.yml.
Docs: security.md rewritten to document all five hardening layers plus
the threat model update.
Tests: 131 pass (mypy clean, zero Any).
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* security: non-root container user, narrow bind mounts, egress allowlist proxy (phase 2)
Three additional hardening measures completing the phase-2 security audit:
1. Non-root process user (Dockerfile + scripts/entrypoint.sh)
- Creates agentception user UID/GID 1001 in Dockerfile
- Installs gosu (purpose-built privilege-drop helper) alongside git/curl/ripgrep
- scripts/entrypoint.sh performs a two-phase startup:
* Root phase: write /etc/resolv.conf, compile SCSS/JS, run Alembic
migrations, chown /worktrees and model cache to agentception:agentception
* Unprivileged phase: exec gosu agentception "$@" → uvicorn runs as PID 1
with UID 1001, zero capabilities to acquire additional privs
- git system config adds user.email/user.name and http.proxy for worktree ops
- Verified: /proc/1/status shows Uid=1001 in the running container
2. Narrow bind mounts (docker-compose.yml)
- Replaces ./:/app with explicit per-directory mounts: agentception/,
.agentception/, pyproject.toml, org-presets.yaml, scripts/, tests/, tools/
- .env, docker-compose*.yml, .git/, Dockerfile, and all other sensitive or
unnecessary files are intentionally excluded
- Verified: /app/.env is absent inside the running container
- docker-compose.ci.yml updated with matching narrow mounts + no-op proxy
service for CI compatibility
3. Egress allowlist proxy (scripts/tinyproxy/)
- scripts/tinyproxy/Dockerfile.proxy: builds a tinyproxy image from
debian:bookworm-slim (no third-party base images)
- scripts/tinyproxy/tinyproxy.conf: FilterDefaultDeny Yes + FilterURLs On;
any domain NOT in the allowlist is TCP-rejected at the CONNECT stage
- scripts/tinyproxy/filter: allowlist covers api.anthropic.com, github.com
family, npm registry, HuggingFace, PyPI, Cloudflare DNS
- agentception service routes all HTTP/HTTPS through http://proxy:8888 via
HTTP_PROXY/HTTPS_PROXY/http_proxy/https_proxy environment variables +
git http.proxy system config
- NO_PROXY exempts internal services: postgres, qdrant, host.docker.internal
- Verified: httpx ProxyError for malicious-exfil-server.io; 200 for
api.github.com; api.github.com Zen quote returned through proxy
docs/guides/security.md fully updated with implementation details, threat model
updates, and documented residual risks.
Tests: 131/131 pass, mypy clean, zero Any, generate.py no drift.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: commit package-lock.json — every CI Docker build was failing
package-lock.json was listed in .gitignore, so it never existed in the
GitHub Actions checkout. The Dockerfile has:
COPY package.json package-lock.json tsconfig.json /app/
Without the lockfile in the build context, every CI run failed at this
step with:
ERROR: "/package-lock.json": not found
npm ci (correct for Docker builds) requires the lockfile to guarantee
reproducible, deterministic package installs. The fix is to remove
package-lock.json from .gitignore and track it. This is standard
practice for applications (as opposed to libraries, which should not
commit lockfiles).
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: uncomment A/B metrics panel div in build.html
The div was accidentally wrapped in an HTML comment, causing
test_ab_panel_polling_div_in_build_html to fail: HTMLParser skips
comment content so parser.found stayed {}, failing the id assertion.
The route (/api/metrics/ab), router registration, and partial template
(_ab_metrics.html) were all correctly wired — only the HTML comment
needed removal.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* ci: fold smoke test into pytest job — eliminate redundant image rebuild
The smoke job ran on a fresh GitHub Actions runner and rebuilt the
Docker image (~2-3 min) just to run two curl calls. The image was
already built in the same pipeline run by the mypy and test jobs.
Fix: remove the separate smoke job and append its health probe steps
to the test job. The image is already built and postgres is already
running on that same runner, so the smoke check adds only ~30s
(container start + entrypoint + healthcheck poll) instead of 4+ min.
Pipeline shape is unchanged: generated-files → typecheck → {test, typing-ceiling}
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add Phase 1A screenshot to README How It Works section
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add Phase 1B screenshot to README How It Works section
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add Ship screenshot to README How It Works section
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add Agent Org screenshot to README How It Works section
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: MCP docs and tests — api ref, README, auth docstring, integrate.md, edge-case tests (#1041)
- API reference: add POST /api/mcp to URL taxonomy and new MCP section
with auth note and link to Security guide and MCP reference
- docs/README: fix directory structure — MCP HTTP route is routes/api/mcp.py,
not mcp/http_server.py; list mcp.py under routes/api/
- routes/api/mcp.py: docstring now states endpoint is protected by
ApiKeyMiddleware when AC_API_KEY is set; point to Security guide
- integrate.md: clarify what MCP provides (invoke tools, read resources,
fetch prompts) in opening paragraph
- test_mcp_http: add test_batch_one_invalid_item_one_valid_returns_mixed_results,
test_resources_read_invalid_uri_via_http, test_tools_call_missing_required_arguments_returns_error
- test_mcp_resources: add test_read_resource_batch_tree_malformed_empty_batch_id_returns_not_found
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: full Cursor decoupling — taxonomy and implementation (#1043)
- A: Remove TaskRunnerChoice.cursor, keep anthropic only; drop CI .cursor volume
- B: Rename cursor_project_id → ide_project_id; roles summary and RoleMeta
- C: _scan_cursor_docs → _scan_agentception_docs; reword config/readers/db/docker
- D: Docs generalized to MCP client (e.g. Cursor); legacy note on cursor-agent-spawning
- E: .cursor/ in .gitignore; remove stale Cursor projects dir from settings UI
See docs/reference/cursor-decoupling-taxonomy.md.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* Merge main into dev — resolve docker-compose.ci.yml conflict (#1045)
* chore: clean up all Maestro, Stori DAW, and Storpheus monorepo bleed (#57)
* Remove all machine-specific paths; fix .env for AgentCeption
- Replace every /Users/gabriel and $HOME/dev/tellurstori/* reference with
portable <repo-root> placeholder or $HOME-relative paths throughout all
role prompts, templates, docs, and scripts
- Strip Maestro/Stori legacy vars from .env; keep only AC_* vars needed by
AgentCeption (DB_PASSWORD, AC_GITHUB_TOKEN, AC_GH_REPO, AC_OPENROUTER_API_KEY)
- Set AC_GH_REPO=cgcardona/agentception (was incorrectly tellurstori/agentception)
- Update agent-tree-protocol.md examples to use <repo-root> placeholder with
explanatory note that AgentCeption fills in the real AC_REPO_DIR path
- Fix docstring example in build.py (tellurstori/maestro → cgcardona/agentception)
- Fix test fixture cursor_project_id to use generic example
* Pass GITHUB_TOKEN into container from AC_GITHUB_TOKEN
macOS stores gh CLI tokens in Keychain, not in the config file, so the
read-only ~/.config/gh volume mount doesn't carry auth into Docker.
Passing AC_GITHUB_TOKEN as GITHUB_TOKEN lets gh CLI authenticate without
a service restart or interactive login inside the container.
* Fix pipeline-config.json repo_dir: /repo → /app
The project entry had repo_dir=/repo but the container mounts the repo at
/app (WORKDIR). This caused settings.repo_dir to be overridden to /repo at
startup, making _CONFIG_PATH resolve to /repo/.agentception/pipeline-config.json
which doesn't exist — so the API fell back to empty defaults and the project
switcher showed no projects.
* Drop AC_ prefix from all environment variables
Now that AgentCeption is its own standalone repo there is no namespace
collision risk with a parent monorepo. Unprefixed names are cleaner for
operators and consistent with standard tooling conventions.
Renamed throughout config, compose files, .env, docs, role prompts, and
templates:
AC_DATABASE_URL → DATABASE_URL
AC_GH_REPO → GH_REPO
AC_REPO_DIR → REPO_DIR
AC_WORKTREES_DIR → WORKTREES_DIR
AC_HOST_WORKTREES_DIR → HOST_WORKTREES_DIR
AC_OPENROUTER_API_KEY → OPENROUTER_API_KEY
AC_GITHUB_TOKEN → GITHUB_TOKEN
AC_PORT → PORT
AC_HOST → HOST
AC_LOG_LEVEL → LOG_LEVEL
AC_POLL_INTERVAL_SECONDS → POLL_INTERVAL_SECONDS
AC_GITHUB_CACHE_SECONDS → GITHUB_CACHE_SECONDS
AC_CURSOR_PROJECTS_DIR → CURSOR_PROJECTS_DIR
Also fixes app.py default port 7777 → 10003 (was stale from monorepo era).
AC_URL in .agent-task files is retained — it is a task-file field, not a
system env var.
* Remove all Maestro, Stori DAW, and Storpheus references from codebase
AgentCeption is now fully standalone with zero bleed from the old monorepo domain.
What changed:
- All `maestro` repo/path/container references → `agentception` equivalents
- All `Stori DAW` product references → removed or replaced with generic `Muse client`
- All `Storpheus` service references → removed or replaced with `Muse` (the protocol)
- Boundary constraint comments (`zero imports from maestro, muse, kly, storpheus`) →
`zero imports from external packages`
- Monorepo dual-container dispatch scripts → single agentception container
- Role personas rewritten: ios-developer, mobile-developer, vp-mobile, vp-ml,
data-scientist, site-reliability-engineer, devops-engineer updated to be generic
(Stori DAW product details stripped; Muse protocol kept intact)
- `muse-specialist` role kept — Muse is our music VCS protocol (like Git for music)
- `scripts/gen_prompts/config.yaml`: removed dead `maestro` codebase entry
- `test_agentception_extraction.py`: import guard updated (still checks for legacy
`import maestro` statements — intentional protection)
- `tools/typing_audit.py`: default dirs now `agentception/ tests/`
- MIDI/Muse vocabulary preserved throughout (beats, Variation, Phrase, NoteChange)
* fix: resolve 21 mypy errors blocking CI (#58)
All errors were pre-existing type issues exposed by mypy's unused-ignore
detection and a stricter unreachable-code check:
- mcp/server.py: fix unreachable branch — type request_id as object first,
narrow to int|str|None with isinstance instead of annotating and guarding
- routes/ui/org_chart.py: remove unused type: ignore[import-untyped] (yaml
now has stubs) and unused type: ignore[assignment]
- routes/ui/plan_ui.py: remove unused type: ignore[attr-defined] on
_strip_fences and _YAML_SYSTEM_PROMPT imports (both now fully typed)
- routes/ui/agents.py: remove 5 unused type: ignore[arg-type] on db_run.get()
- tests/test_issue_creator.py: remove 4 unused type: ignore[assignment]
- tests/test_agentception_mcp_plan.py: remove unused ignores; fix real type
errors by adding isinstance narrowing before subscripting list[object] and
dict[object,object] values returned from dict[str,object] payloads
- tests/test_pipeline_panel.py: remove unused type: ignore[index]
- tests/test_agentception_analyze_partial.py: remove 3 unused type: ignore[arg-type]
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* chore: set typing ceiling to zero Any, rename from ratchet to ceiling (#59)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* fix: correct dispatcher prompt path from .cursor/ to .agentception/ (#65)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* fix: correct role-taxonomy.yaml path in models package and remove dead models.py (#66)
The package __init__.py computed the path as parent×2 from its own location
(agentception/models/__init__.py), resolving to agentception/ rather than the
repo root. Corrected to parent×3 so the path reaches scripts/gen_prompts/.
Also deleted the shadowed agentception/models.py — Python always resolves
agentception.models to the package directory, making the flat file permanently
dead code and causing a mypy "duplicate module" error.
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* fix: implement MCP initialize/initialized handshake so Cursor can connect (#67)
Cursor sends an `initialize` request the moment it opens the stdio transport.
Our server returned Method-not-found, which caused Cursor to close the
connection immediately.
Changes:
- server.py: handle `initialize` — respond with protocolVersion, capabilities,
and serverInfo per MCP spec 2024-11-05.
- server.py: handle `initialized` — it is a JSON-RPC notification (no id),
so return None to signal the caller must not write anything to the wire.
- server.py: update `handle_request` return type to `dict[str, object] | None`.
- stdio_server.py: skip writing when `handle_request` returns None.
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* docs: add developer-workflow guide (closes #60) (#69)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* docs: add contributing guide (closes #61) (#70)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* feat: add scripts/dev.sh convenience wrapper (closes #62) (#68)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* docs: add CHANGELOG.md with v0.1.0, v0.2.0, v0.3.0 seed entries (closes #63) (#71)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* docs: update README Quick Reference with dev tools, guides, changelog (closes #64) (#72)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* feat: dispatcher pending-launch guard and async MCP stdio (#73)
Prevents the poller from sweeping `pending_launch` runs to `unknown`
before the Dispatcher ever reads them, and wires up async tool execution
in the stdio MCP transport so build tools (and plan tools) are awaited
correctly instead of returning an error.
Changes:
- db/persist.py: exclude `pending_launch` from the orphan-sweep active
statuses; add detailed warning-level logging to persist_agent_run_dispatch;
protect pending_launch rows from being clobbered by the poller
- db/queries.py: surface host_worktree_path in get_pending_launches results
- db/models.py: minor model alignment
- mcp/server.py: add handle_request_async that awaits call_tool_async,
fixing all async tools in the stdio transport
- mcp/stdio_server.py: own the event loop with asyncio.run; call
init_db on startup; use handle_request_async; add structured logging
- mcp/build_tools.py: add warning-level debug logging to
build_get_pending_launches so Dispatcher runs are traceable
- routes/api/build.py: use settings.ac_url (removes getattr fallback);
add warning-level tracing around dispatch_label_agent DB write
- config.py: add ac_url setting (default http://localhost:10003) with
AC_URL env-var override
- .agentception/dispatcher.md: update dispatcher prompt
- .agentception/roles/devops-engineer.md: minor role update
- .gitignore, Dockerfile, docker-compose.yml, README.md: housekeeping
- tests/test_persist_pending_launch_guard.py: regression tests for the
pending_launch guard (queue not drained by poller sweep)
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* fix: narrow handle_request return type in MCP plan tests (#74)
handle_request now returns dict[str, object] | None (None for
notifications). The existing tests assumed a non-None dict, causing 24
mypy errors in CI. Add a local _unwrap() helper that asserts the
response is not None and narrows the type, then thread it through all
handle_request call sites in the test file.
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* fix: eliminate all 47 Any patterns — zero typing audit violations (#75)
* fix: eliminate all 47 Any patterns — zero typing audit violations
Replaces every dict[str, Any], list[Any], and # type: ignore with
proper TypedDicts and isinstance narrowing across all 7 offending files.
Category 1 — structured return shapes (39 occurrences):
- db/queries.py: 20 TypedDicts (BoardIssueRow, AgentRunRow, AgentRunDetail,
PipelineTrendRow, IssueDetailRow, PRDetailRow, WaveRow, PendingLaunchRow,
AgentEventRow, AgentThoughtRow, and 10 more). AgentEventRow.payload is
now a raw JSON string; build_ui.py updated to json.loads it on use.
- routes/ui/org_chart.py: 7 TypedDicts (RoleEntry, AnnotatedRoleEntry,
PipelineConfig, OrgPreset, TierEntry, BuilderContext). _read_pipeline_config
parses keys explicitly instead of PipelineConfig(**raw).
Category 2 — # type: ignore on dict narrowing (4 occurrences):
- api_reference.py: 2×assignment — isinstance checks before subscript.
- agents.py: 1×assignment — removed; AgentRunDetail now typed correctly.
- docs.py: 1×return-value — widened return to Response instead of HTMLResponse.
Category 3 — import-untyped (1 occurrence):
- _shared.py: added markdown to pyproject.toml mypy.overrides instead of
# type: ignore[import-untyped].
Category 4 — list[Any]/AsyncIterator[Any] (2 occurrences):
- tests/test_issue_creator.py: TypeGuard narrowers (_is_start, _is_label,
_is_issue, _is_done, _is_error) replace bare Any in _collect helper and
enable typed event discrimination throughout the test suite.
Cascading caller fixes:
- agents.py: 4 new TypedDicts (AgentEnrichedRow, EnrichedAgentRunRow,
BatchRow, RoleGroupRow); all dict[str, object] annotations updated.
- build_ui.py: EnrichedIssueRow + EnrichedPhaseGroupRow; mutating
PhasedIssueRow replaced with explicit TypedDict construction.
- telemetry.py: list[dict[str, object]] → list[PipelineTrendRow].
Result: mypy strict — 0 errors (143 files); typing audit — 0 Any patterns.
* docs: update CI threshold and add DB query TypedDict reference
Update typing-ratchet --max-any from 10 to 0 in ci.md to reflect the
enforced zero-Any ceiling. Add a DB Query TypedDicts section to
type-contracts.md documenting all 20+ named row types introduced during
the Any-elimination refactor.
---------
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
---------
Co-authored-by: Gabriel Cardona <gabriel@tellurstori.com>
* Local model integration (#986)
* feat: wire enrich_plan_with_codebase_context into plan_ui.py and issue_creator.py (#912)
- Add import of `enrich_plan_with_codebase_context` to both call sites
- plan_ui.py: call enricher after `PlanSpec.model_validate(parsed)`, before `spec.to_yaml()`
- issue_creator.py: call enricher at top of `file_issues()`, before `repo = _cfg.gh_repo`
- Both call sites use try/except with logger.warning on failure — enrichment is best-effort
- Add integration tests in `agentception/tests/test_plan_enricher_integration.py`:
- test_filed_issue_body_contains_codebase_locations: verifies enriched bodies flow through
- test_enrichment_failure_does_not_block_filing: verifies RuntimeError is swallowed
Closes #870
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: mechanically enforce STOP after pytest exits 0 (#913)
After any run_command call containing "pytest" returns exit_code=0, the
agent loop arms a hard-stop interrupt. On every subsequent iteration,
_PYTEST_STOP_BLOCKED_TOOLS (read_file, search_text, list_directory, etc.)
are intercepted and returned as synthetic errors — the same mechanism as
the loop guard — so the agent cannot enter a post-test audit loop.
The stop is armed per-iteration in extra_system_blocks with _PYTEST_STOP_OVERRIDE
and enforced mechanically in a second interception pass (Pass 2) after the
existing loop guard. It is disarmed automatically when:
- The agent writes new code (file-mutating tools) in a later iteration, because
the new code is untested and must be re-verified.
- A subsequent pytest invocation fails, so the agent can fix the regression.
Two regression tests added to TestPytestHardStop:
- test_pytest_stop_blocks_reads_after_clean_exit: verifies HARD STOP appears
in extra_system_blocks and read_file is never dispatched after pytest passes.
- test_pytest_stop_disarmed_by_file_write: verifies read_file reaches the
real dispatcher once a file write disarms the stop.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: tool surface audit — 13 bugs, dead code, schema mismatches (#914)
Critical
- build_complete_run inputSchema now declares grade and reviewer_feedback
(both optional strings); additionalProperties: False was silently stripping
them, making the reviewer grade pathway unreachable through any MCP client
- build_complete_run now returns isError=not bool(result.get("ok", True))
instead of hardcoded False; every other build_* tool already did this
High
- search_text and find_call_sites: remove --max-count from rg args; add
_truncate_rg_output() which counts only numbered match lines and stops at
n_results total across all files — the previous per-file limit could return
N×num_files lines despite the schema saying "max N total"
- definitions.py: update both n_results descriptions to "total … across all files"
Medium
- _PYTEST_STOP_OVERRIDE and synthetic error message now say run_command
(git add/commit/push) as the commit path, with git_commit_and_push listed
as a secondary option — developer agents don't have git_commit_and_push
- replace_in_file dispatch: allow_multiple coercion widened from
isinstance(allow_raw, bool) to isinstance(allow_raw, (bool, int)) so an
integer 1 from the model is not silently treated as False
Dead code (no-legacy rule)
- _READ_ONLY_TOOL_NAMES = frozenset() legacy alias deleted from agent_loop.py
- build_spawn_child_run (106-line orphaned function) deleted from
build_commands.py; its spawn_child/SpawnChildError/Tier/ScopeType imports
removed; module docstring updated; runs.py and docs/guides/mcp.md updated
to reference the live build_spawn_adhoc_child MCP tool
- plan_get_labels, plan_get_cognitive_figures dead imports removed from server.py
- "create_directory" ghost entry removed from _KEY_ARG log-hint dict
Style / non-idiomatic
- _CLASS_DEF_RE moved from inside insert_after_in_file body to module-level
constant (was re-compiled on every call)
- f-string without format args removed in shell_tools.py
- search_codebase collection description: "Omit (or leave null)" → "Omit"
(type is "string" with no null union; "leave null" could cause API rejection)
- read_file docstring: path resolution note updated to reflect dispatcher reality
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: address three deferred tool-surface audit items
_GUARD_PERMITTED_TOOL_NAMES: add run_command and git_commit_and_push to the
loop-guard allowlist. Without them a guarded agent writes code it can't verify
(no mypy/pytest) and can't deliver (no git push), trapping it in an infinite
guard loop. Regression guard: test_guard_allowlist_includes_shell_tools asserts
both tools are present so the allowlist can never silently regress.
find_call_sites regex: expand the pattern to cover from-import lines
(from x import symbol) and type-annotation contexts (symbol: / symbol[).
Previously only call sites (symbol() and bare import lines were matched.
read_symbol heuristic: document that the indentation-based fallback path does
NOT work for brace-delimited languages (TypeScript, JavaScript). If a TS agent
role is added the heuristic must be replaced with a tree-sitter extractor.
* feat: suppress implementing/reviewing status badge in active-lane cards (#916)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add Plan enrichment section to docs/reference/api.md (#872)
* fix: two reviewer merge bugs — self-approval failure and unmerged-PR teardown
Bug 1: reviewer prompt instructed pull_request_review_write(event="APPROVE")
before merge_pull_request. GitHub rejects self-reviews (author == reviewer
same account) with 422, wasting an iteration and confusing the model into
giving up on the merge. Fix: remove the APPROVE step entirely from the
grade-A flow. The merge stands alone; GitHub doesn't require an approved
review to merge unless branch protection rules enforce it.
Bug 2: when merge_pull_request failed (branch behind dev), the reviewer
called build_complete_run(grade="A") without merging. build_complete_run
responded by scheduling teardown_agent_worktree, which deleted the remote
branch, which caused GitHub to auto-close the unmerged PR.
Fix: build_complete_run now calls _is_pr_merged (GitHub REST API 204 check)
before scheduling teardown for reviewer grade-A/B completions. If the PR
is not yet merged the call returns an error, forcing the reviewer to call
merge_pull_request first.
Also strengthens the "branch behind dev" section in reviewer.md.j2 to
make the mandatory rebase+retry path explicit and unambiguous.
Regression tests:
- test_build_complete_run_blocks_grade_a_when_pr_not_merged
- test_build_complete_run_allows_grade_a_when_pr_merged
* refactor(scss): split _build.scss into per-feature partials (#873)
_build.scss was 2069 lines. Every frontend ticket appended to it, causing
rebase conflicts between parallel agents. Extracted five component-scoped
partials and one layout partial; replaced _build.scss with a @use barrel.
New files:
_inspector-layout.scss — all inspector/build layout rules (lines 1-1892)
_thought-block.scss — .thought-block (58 lines)
_file-edit-card.scss — .file-edit-card + .diff-add/remove/context (32 lines)
_assistant-bubble.scss — .assistant-bubble (12 lines)
_tool-call-card.scss — .tool-call-card (42 lines)
_event-card.scss — .event-card (27 lines)
Compiled app.css is byte-identical to pre-change artifact:
sha256: 17e18481c378787f472aee951c3a4ce218f2e9d5b06721b0570779156e50d875
.gitattributes: merge=union added for all six new partials and the barrel.
* feat: convert agentception/db/queries.py into a package (#874) (#921)
Move queries.py → queries/_monolith.py unchanged (no query logic modified).
Create queries/types.py with all 46 TypedDict definitions extracted from the
monolith. Wire queries/__init__.py with explicit re-exports (import X as X)
so that every existing caller continues to work without change.
mypy passes on all 278 files; zero new test failures introduced.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: split query monolith into 5 domain submodules (#876) (#922)
Create agentception/db/queries/{board,runs,messages,events,metrics}.py.
Each file imports TypedDicts from agentception.db.queries.types (not
from the monolith), carries only functions (no TypedDict definitions),
and is importable in isolation.
_monolith.py and __init__.py are unchanged — callers are not affected.
The next issue will wire __init__.py re-exports to these files and
delete _monolith.py.
mypy passes on all 283 source files; zero new test failures introduced.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: wire queries/__init__.py from domain submodules, delete _monolith.py (#875) (#923)
Replace the monolith re-export in __init__.py with explicit per-domain
imports (board, runs, messages, events, metrics, types). All 102 public
and test-required private symbols are re-exported via the `X as X` pattern
so every existing `from agentception.db.queries import X` call-site
continues to work without modification.
Delete _monolith.py — the 3250-line file is fully replaced by the six
focused submodules.
Add merge=union .gitattributes entries for all six new domain files.
mypy passes on 282 source files; zero new test failures.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: audit imports, fix broken tests, document new module structure (#877) (#924)
Audit: zero direct-path queries.py imports in the test suite (all three
grep checks pass clean).
Fix 4 previously broken tests by updating stale mock.patch targets to
point at the domain submodule where each dependency is locally bound:
- test_get_daily_metrics_returns_zeros_on_db_error:
agentception.db.queries.get_session
→ agentception.db.queries.metrics.get_session
- test_get_issues_grouped_by_phase_* (3 tests):
agentception.db.queries.get_session
→ agentception.db.queries.board.get_session
agentception.db.queries.get_initiative_phase_meta
→ agentception.db.queries.board.get_initiative_phase_meta
Docs: add "## Query module structure" and "## SCSS partial structure"
sections to docs/architecture.md describing the six query submodules
and six SCSS partials, their domain ownership, and the merge=union
.gitattributes strategy.
mypy clean on 282 files; 104 tests pass with zero failures.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: log per-turn token counts and track last_input_tokens in agent loop (#925)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: token-aware _prune_history with last_input_tokens budget guard (#926)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: add missing recorded_at to ACAgentEvent inserts in persist.py (#927)
Two ACAgentEvent creation sites were missing the required recorded_at
field — complete_agent_run (build_complete_run event) and the orphan
sweep (orphan_failed event). The NOT NULL constraint caused a DB
exception that rolled back the whole transaction, so the agent run
status was never updated from implementing to completed/done.
The result was that every successfully-completed agent run left its DB
row stuck on status=implementing despite the PR being merged. The
orphan sweep then had the same bug but was harder to notice because
it fires infrequently.
Fix: pass recorded_at=_now() (complete_agent_run) and recorded_at=now
(orphan sweep, which already has `now` in scope).
Also manually corrected issue-884's stuck row to status=done.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: replace dict[str, Any] with DailyMetrics TypedDict in test_metrics_api.py (#928)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: inject _CONTEXT_PRESSURE_WARNING into extra_blocks when input tokens exceed threshold (#929)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: fetch origin/dev before creating developer worktree (#931)
Developer dispatches set worktree_base = "origin/dev" without first
fetching, so the container's ref tracker could lag behind GitHub by one
or more merged PRs. The worktree was created from a stale tip, and the
agent's first push immediately diverged from origin/dev — causing an
unnecessary rebase conflict on every back-to-back dispatch.
Reviewers and continuations already fetched their branch before creating
the worktree (lines 718-724). Mirror that pattern for developers in the
else-branch: run git fetch origin dev before setting worktree_base.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: add _summarise_history and context checkpoint injection to _prune_history (#930)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: update stale patch paths and add _is_pr_merged mock; add context window management docs (#932)
- Update patch targets in test_phase_grouping.py, test_get_initiatives.py,
and test_persist_regression.py from agentception.db.queries.get_session
to agentception.db.queries.board.get_session (post-#876 module split)
- Add _is_pr_merged mock to test_build_complete_run_reviewer_does_not_redispatch_reviewer
so the test does not make a live GitHub API call
- Add ## Context window management section to docs/architecture.md
Closes #888
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: add prompt_variant to ACAgentRun, AgentTaskSpec, and Alembic migration 0011 (#933)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: orphan sweep grace period to avoid race with dispatch (#934)
Do not mark a run failed when last_activity_at is within 60s. The poller
builds live_ids from list_active_runs() at tick start; dispatch can commit
acknowledge_agent_run after that, so the run is in the DB as implementing
but not yet in live_ids. Without the grace period the orphan sweep would
immediately mark it failed.
- Add _ORPHAN_GRACE_SECONDS and skip orphan when last_activity_at recent
- Add test_orphan_grace_period_skips_recently_acknowledged_run
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: variant-aware _load_role_prompt and prompt_variant dispatch param (#935)
* feat: variant-aware _load_role_prompt and prompt_variant dispatch param
* fix: add Any type parameter to captured_kwargs in test_dispatch_variant
---------
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: remove Any from test_dispatch_variant (typing ratchet) (#937)
Use list[dict[str, str | int | None | bool]] and typed **kwargs
instead of list[dict[str, Any]]. Satisfies zero-Any rule.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: add activity_events module with persist helper and TypedDict payload shapes (#945)
* feat: add activity_events module with persist helper and TypedDict payload shapes
* fix: correct db_session fixture return type to Generator[Session, None, None]
---------
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add planning prompts and tight-ticket standards guide (#946)
- What human prompts 1A receives (brain dump) and what 1A/1B produce
- PlanSpec issue body format (seven sections, order)
- Informal rules for brain dumps that yield agent-ready tickets:
one deliverable, cap read-heavy work, specific locations, clear done
criteria, minimal phases, format when needed, doc-only vs implement
- References to llm_phase_planner, issue_creator, plan-spec
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* tools: steer agent to search_text over run_command(grep) for code search (#947)
- search_text: add 'PREFER this over run_command(grep/ripgrep) for
searching the codebase' so the model uses the dedicated ripgrep tool
instead of shelling out to grep.
- run_command: add 'For searching the codebase for a string or regex,
use search_text instead of grep.' to avoid defaulting to grep.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: clear worktree_path in DB after reaper releases worktree (#948)
Reaper was re-finding the same terminal runs every pass because we never
cleared worktree_path after release_worktree(). Now we call
clear_run_worktree_path(run_id) only when release_worktree returns True,
so the next reaper pass (or startup) does not re-process the same runs.
release_worktree now returns bool for success/failure.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: agent_loop path/lines + search_text logging, watch_run display, block grep in run_command (#949)
- agent_loop: log path + line range for read_file_lines, pattern + directory for search_text
- watch_run: parse dispatch_tool args and show read path lines X–Y and search_text pattern/dir
- shell_tools: block run_command(grep), direct agent to search_text (ripgrep, .gitignore-aware)
- test_shell_tools: test_grep_blocked_use_search_text
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add watch-run-log-map.md mapping every log pattern to emission site (#950)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: preemptive nudge before loop guard; only dispatch reviewer after worktree release (#951)
- agent_loop: inject ONE TURN BEFORE READ-ONLY LOCK nudge when
iterations_since_write == threshold-1 so model calls write in that response
- build_commands: only call auto_dispatch_reviewer when release_worktree
returns True; return error to agent when release fails
- tests: test_preemptive_nudge_injected_one_turn_before_loop_guard,
test_implementer_completion_fails_when_release_worktree_returns_false
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: clear run events/messages on dispatch; dedupe read_file_lines in watch_run (#952)
- persist: delete ACAgentEvent and ACAgentMessage for run_id at start of
persist_agent_run_dispatch so re-dispatches show a clean timeline
- watch_run: do not render dispatch_tool line for read_file_lines; show
only the file_tools result line (path + lines + total) to avoid duplicate
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(activity-events): emit tool_invoked, delay, llm_* from agent_loop (#940) (#953)
- agent_loop: persist tool_invoked after dispatch_tool log; persist delay in
_enforce_turn_delay; thread session/run_id through loop, _dispatch_tool_calls,
_dispatch_single_tool (session optional for debug_loop).
- llm: add session, run_id, iteration to call_anthropic_with_tools; persist
llm_iter, llm_usage, llm_reply, llm_done after existing logger calls.
- activity_events: persist_activity_event accepts AsyncSession (caller flushes).
- tests: test_agent_loop_activity_events.py (tool_invoked, delay); update
TestEnforceTurnDelay and TestDispatchToolCalls for new session/run_id args.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix(agent_loop): continue when stop_reason=tool_calls with empty list (#954)
Anthropic can return stop_reason='tool_calls' with tool_calls=[] (e.g.
truncation or API quirk). We previously fell through to 'unexpected' and
cancelled the run. Now we inject a nudge and continue the loop instead of
cancelling.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* chore: ship resync UX, poll docs, and dispatch script (#955)
- Mission Control: Reload button spins in place (no separate loading spinner),
triggers immediate board refresh via HX-Trigger after resync. Button is its
own HTMX indicator; aria-label/title 'Refresh from GitHub'.
- resync.py: Return HX-Trigger: refreshBoard on HTMX success so board refetches
immediately after resync.
- .env.example + docs/reference/poller.md: Document 5s poll default (safe for
GitHub rate limits); align example and poller doc with config default.
- tests: test_build_page_structure resync assertion; test_build_ui aria-label.
- scripts/dispatch_issue_941.py: One-off script to dispatch issue 941 via
POST /api/dispatch/issue (fetch issue from GitHub, then POST).
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: emit activity events from file_tools (read, replace, insert, write) (#956)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix(build_complete_run): release derived worktree path when DB has none (#957)
When get_agent_run_teardown returns None or worktree_path is null we skipped
release_worktree but still dispatched the reviewer, causing 'branch already
used by worktree' on reviewer dispatch. Now we derive the path as
worktrees_dir/run_id and attempt release before dispatching; only run rebase
when the path exists. Tests updated to expect release_worktree with derived path.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(942): emit activity events from shell_tools, git helper, and GitHub MCP (#958)
- run_command: emit shell_start (cmd_preview, cwd) and shell_done (exit_code, stdout_bytes, stderr_bytes)
- git_commit_and_push: emit git_push (branch) after successful push
- agent_loop: emit github_tool (tool_name, arg_preview) when dispatching GitHub MCP tools
- Optional run_id/session on run_command and git_commit_and_push; persist only when both set
- Wrap all persist calls in try/except so DB failures never propagate
- Add tests/tools/test_shell_github_activity_events.py (shell_start/done, github_tool, persist failure, git_push)
- Fix mypy: test_build_commands_rebase await_args can be None
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(943): extend inspector SSE to stream all activity events in chronological order (#959)
- Add get_all_events_tail(run_id, after_id) in db/queries/events.py; get_agent_events_tail delegates to it
- _inspector_sse: use get_all_events_tail, merge events and thoughts by recorded_at, emit in order
- Emit activity events as {"t": "activity", "subtype", "payload", "recorded_at", "id"}; other events keep {"t": "event", ...} with id
- Add tests/routes/test_inspector_sse.py: activity events in stream, cursor advances, ordered by id
- Update existing tests to patch get_all_events_tail
- Document GET /ship/runs/{run_id}/stream and activity subtypes in docs/reference/api.md
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(944): consume SSE activity messages and append live DOM rows to inspector feed (#960)
- Add activity_feed.ts: ActivityMessage type, formatActivitySummary (per-subtype text), appendActivityRow (div.activity-feed__row with data-subtype, icon placeholder, summary, time), attachActivityFeedHandler
- Wire attachActivityFeedHandler in build.ts _openStream
- Add _activity-feed.scss for .activity-feed__row, __icon, __summary, __ts
- Unit tests: formatActivitySummary, appendActivityRow, attachActivityFeedHandler
- All payload text via textContent/setAttribute; no innerHTML
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat: variant-aware _load_role_prompt and prompt_variant dispatch param (fixes mypy) (#936)
* feat: variant-aware _load_role_prompt and prompt_variant dispatch param
* fix: type annotations in test_dispatch_variant (mypy)
Use list[dict[str, str | int | None | bool]] and typed **kwargs
instead of bare list[dict]. No Any.
---------
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(891): add developer-streamlined prompt variant for A/B testing (#961)
- Add scripts/gen_prompts/templates/roles/developer-streamlined.md.j2 (slimmed variant:
'item' not 'AC item', Step 3 = Ship and open PR combined; mypy + tests + Hard rules kept)
- Run generate.py → .agentception/roles/developer-streamlined.md
- Add tests/test_developer_streamlined_prompt.py (exists, contains mypy, excludes AC item/code smell)
- Default developer.md unchanged; variant used only when prompt_variant=streamlined
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(892): add GET /api/metrics/ab for per-variant A/B metrics (#962)
- New read-only endpoint: GET /api/metrics/ab?days=N (default 7, clamp 1–90)
- ABVariantMetrics: variant (COALESCE prompt_variant, 'control'), role, runs,
avg_iterations, avg_input_tokens, total_tokens, pass_rate, passed, failed
- Query joins agent_runs to agent_events (step_start count, done event grade)
- Returns 200 with empty variants: [] when no data or on DB error
- Register ab_metrics router in api __init__
- Tests: response shape (control + streamlined), empty DB, days param, 422 for days=0
- Docs: GET /api/metrics/ab in api.md with query params and response schema
- No change to dispatch, prompts, or existing routes
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(893): add HTMX A/B metrics panel to build page (#963)
- build.html: polling div #ab-metrics-panel with hx-get=/api/metrics/ab, every 30s
- partials/_ab_metrics.html: table.ab-metrics (Variant, Runs, Avg Iters, Pass Rate, Avg Tokens)
- ab_metrics.py: when HX-Request: true return HTML partial, else JSON (response_model=None)
- _ab_metrics.scss: table styles + .loading-placeholder; import in pages/_build.scss
- test_ab_metrics_panel.py: polling div in build.html, HTMX returns HTML, JSON unaffected
- api.md: A/B Metrics Panel note (HX-Request returns HTML)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs(964): add local LLM MLX guide for Qwen 3.5 35B on Apple Silicon (#968)
- docs/guides/local-llm-mlx.md: install (mlx-lm / mlx-vlm), model choice
(mlx-community/Qwen3.5-35B-A3B-4bit), run options (generate, chat, server),
powermetrics/Activity Monitor for CPU/GPU/ANE, checklist for #964
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(local-llm): full pipeline for local Qwen 3.5 35B, doc and flow
- Config: use_local_llm, local_llm_base_url, chat path, model, context caps
- LLM: call_local_with_tools (same contract as Anthropic), call_local_completion
- Agent loop: developer uses local LLM with full system prompt + cognitive arch;
only endpoint and context caps differ; comment and doc clarify same pipeline
- API: GET /api/local-llm/hello, POST /api/local-llm/hello-agent
- Docker: extra_hosts for host.docker.internal
- Watch script: local LLM indicator in ITER line, heartbeat shows local vs LLM
- Docs: local-llm-mlx.md — 48 GB runbook, mlx-openai-server, mlx-vlm >= 0.3.12,
two-step install, torch/torchvision for processor, same pipeline as Anthropic
* feat: plan-scoped integration branch (#974)
- Create plan branch on first dispatch (from origin/dev); reuse for later issues.
- Persist plan_id and plan_branch on runs; persist plan_issues when file_issues completes.
- Use plan branch as worktree base and PR base when plan_id is set; inject PR base into briefing.
- Rebase implementer branch onto plan branch (not dev) before dispatching reviewer when plan-scoped.
- When last issue in plan is merged into plan branch: rebase plan onto dev, open plan→dev PR, dispatch reviewer.
New tables: plan_issues, plan_branches. Migration 0012.
Docs: architecture/plan-scoped-integration-branch.md updated; link from architecture.md.
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: add LLM provider abstraction architecture doc (#975)
- Add docs/architecture/llm-provider-abstraction.md (planning status).
- Link from docs/architecture.md Further Reading.
- Remove unneeded scripts/curl_local_plan.sh (was untracked).
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* feat(llm): add provider-agnostic public API (step 1)
Add completion(), completion_stream(), completion_with_tools() as the
single contract. They delegate to call_anthropic* / call_local_with_tools
(use_local_llm for tools). No caller changes yet; next step will switch
plan_ui, llm_phase_planner, agent_loop to use these entry points.
* fix(agent_loop): init tc_to_dispatch/tool_results each iteration; test: mock get_session and use_local_llm
- Initialize tc_to_dispatch and tool_results at start of each loop iteration
so bookkeeping (pytest arm, etc.) never hits UnboundLocalError when
stop_reason is 'length' or 'tool_calls' with empty tool_calls.
- Tests: add _mock_get_session() and patch get_session in all run_agent_loop
tests so they run without init_db(); set use_local_llm=False on mock
settings; add session.flush = AsyncMock() so dispatch path is awaitable.
* feat(llm): wire callers to provider-agnostic API (phase two)
- llm_phase_planner: use completion() instead of call_anthropic
- plan_ui: use completion_stream() instead of call_anthropic_stream
- agent_loop: use completion() for recon and _summarise_history;
use completion_with_tools() for main loop (replaces call_local_with_tools
and call_anthropic_with_tools). Provider selection remains in llm layer.
- Tests: patch completion/completion_with_tools/completion_stream at
call sites; plan_ui tests patch completion_stream; agent_loop tests
patch completion and completion_with_tools. LLM retry tests still
exercise call_anthropic_with_tools in llm module.
- Docstring: llm_phase_planner describes completion() as entry point.
* fix(typing): resolve typing audit — TypedDict payloads, no type: ignore
- activity_events: convert FileReadPayload, FileReplacedPayload,
FileInsertedPayload, FileWrittenPayload from dict subclass to TypedDict
so dict literals type-check. Accept Mapping[str, object] in
persist_activity_event so TypedDict payloads are accepted.
- file_tools: restore named payload types and remove all four
type: ignore[assignment]. _emit_activity accepts Mapping[str, object].
Typing audit: 0 Any, 0 type_ignore; mypy clean.
* feat(llm): phase three — LLM_PROVIDER config and single adapter selection
- Add LLMProviderChoice (anthropic | local) and llm_provider config (LLM_PROVIDER env).
- effective_llm_provider: USE_LOCAL_LLM=true overrides to local for backward compat.
- completion(), completion_stream(), completion_with_tools() branch only on
settings.effective_llm_provider; local completion_stream yields one content chunk.
- agent_loop and local_llm route use effective_llm_provider instead of use_local_llm.
- Tests: config effective_llm_provider + LLM_PROVIDER parsing; llm provider selection.
- Doc: llm-provider-abstraction step 4 marked done.
* feat(llm): phase four — local adapter behind contract
- Content normalization: _normalize_openai_message_content() handles
message.content as string or list of parts; strips reasoning, concatenates
text for final answer. Used in completion and tool-use response parsing.
- Local adapter helpers: _local_base_url(), _local_chat_url(),
_local_completion_payload(); call_local_completion() accepts temperature.
- True streaming: _local_completion_stream() POSTs with stream=true, parses
SSE, maps delta.content / delta.reasoning_content to LLMChunk; on failure
or unsupported server falls back to one-shot and yields one content chunk.
- Public completion_stream() uses _local_completion_stream for local provider.
- /api/local-llm/hello uses public completion() instead of call_local_completion.
- Tests: normalize (string, list, reasoning stripped, empty); stream fallback.
- Doc: step 3 (Local adapter) marked done; status Phases 1–4 implemented.
* docs(llm): phase five — LLM contract, deployment guide, cross-doc updates
- Add docs/reference/llm-contract.md: contract (completion, completion_stream,
completion_with_tools), types, provider selection (LLM_PROVIDER,
effective_llm_provider), adapters, step-by-step how to add a provider.
- Rewrite docs/guides/local-llm-mlx.md: Config and environment section with
full table (LLM_PROVIDER, USE_LOCAL_LLM, all LOCAL_LLM_*); How the local
adapter works (normalization, streaming, fallback); update all integration
steps to mention LLM_PROVIDER and effective provider; add LLM contract ref.
- docs/reference/type-contracts.md: document public API and provider-agnostic
contract; link to llm-contract.md; update diagrams/tree to completion_*.
- docs/guides/setup.md: add LLM_PROVIDER, USE_LOCAL_LLM to .env table; add
Local LLM subsection with pointer to local-llm-mlx and llm-contract.
- docs/guides/security.md: LLM section covers Anthropic and local provider;
link to llm-contract and local-llm-mlx.
- docs/architecture.md: llm.py described as provider-agnostic; completion_*
and config; context window note uses completion_with_tools().
- docs/architecture/llm-provider-abstraction.md: mark step 6 (Document and
test) done; status Phases 1–5 implemented; add pointers to new/updated docs.
- docs/README.md: add Local LLM with MLX guide; add LLM Contract reference;
system overview and directory structure use completion_* and provider config.
* docs: clarify OpenAI-compatible = wire format, not OpenAI cloud
- local-llm-mlx: Naming section (Chat Completions API, mlx-openai-server
package name vs local inference); Option C retitled; mlx_lm.server /
mlx-openai-server called out as local; 48B runbook link text.
- llm-contract: Local adapter row + sentence on wire format vs vendor.
- setup: LLM_PROVIDER row notes local server, not OpenAI cloud.
* fix(local-llm): cap max_tokens for mlx-openai-server (422)
mlx-openai-server rejects max_tokens > 4096 with HTTP 422. Plan 1A and
streaming used 8192/16k; clamp every local chat payload to
local_llm_completion_token_ceiling (default 4096).
- config: LOCAL_LLM_COMPLETION_TOKEN_CEILING
- llm: _local_cap_max_tokens, call_local_with_tools + _local_completion_payload
- tests: cap + payload + config default
- docs: 422 cause, llm-contract generation budget
* feat(llm): normalize think-tags, fix yaml-parse fallback, decisiveness prompt
- Add _normalize_think_tags() generator in services/llm.py to reclassify
<think>...</think>-wrapped content as type="thinking" chunks; ensures
plan_ui always receives clean thinking/content separation regardless of
whether the backend uses reasoning_content fields or inline tags (Qwen3)
- Add repetition_penalty=1.1 and frequency_penalty=0.3 to local completion
payload to discourage degenerate generation loops
- Remove client-side repetition detector from plan_ui (_REPEAT_WINDOW /
_REPEAT_MIN_LEN) — model-level penalties are the correct layer; the
detector caused false positives on structured YAML with repeated section
headers and prematurely broke valid streams
- Wrap YAML safe_load + PlanSpec.model_validate in plan_ui in its own
try/except so malformed LLM output always falls back to the clarify plan
rather than emitting a stream-level error event
- Rewrite _IDENTITY and _YAML_SYSTEM_PROMPT preamble to be more decisive
and direct, reducing agent "overthinking" and re-planning loops
- Reduce streaming read timeout from 300s to 90s and add specific
httpx.ReadTimeout logging to detect mlx-openai-server stalls faster
- Add 5 unit tests for _normalize_think_tags covering basic split,
no-tags passthrough, already-classified chunks, cross-chunk tags,
and multiline Qwen-style output
- Update test suite: rename/update integration tests to reflect removal
of repetition detection and addition of think-tag normalization
* feat(local-llm): Ollama as primary backend, per-usecase routing, LiteLLM proxy guide
- Remove repetition_penalty from _local_completion_payload — not an
OpenAI-standard parameter; causes silent incompatibility with Ollama's
OpenAI-compat endpoint. frequency_penalty is standard and stays.
- Raise LOCAL_LLM_COMPLETION_TOKEN_CEILING default 4096 → 8192; the old
value was a mlx-openai-server workaround (422 above 4096); Ollama
supports full context lengths. Update test assertion accordingly.
- Add per-usecase model/URL overrides in config.py:
LOCAL_LLM_BASE_URL_PLAN / LOCAL_LLM_MODEL_PLAN for completion_stream()
LOCAL_LLM_BASE_URL_AGENT / LOCAL_LLM_MODEL_AGENT for completion_with_tools()
Properties effective_local_base_url_plan/agent and
effective_local_model_plan/agent fall back to the global values when unset.
Enables Phase 3 two-model routing through LiteLLM Proxy.
- Wire _local_completion_stream to use effective_local_base_url_plan and
effective_local_model_plan; wire call_local_with_tools to use
effective_local_base_url_agent and effective_local_model_agent.
- Update .env: raise LOCAL_LLM_COMPLETION_TOKEN_CEILING to 8192.
- Rewrite docs/guides/local-llm-mlx.md: make Ollama the primary
recommendation with full install/pull/serve runbook; demote
mlx-openai-server to a developer footnote. Add per-usecase config
table and link to the new scaling guide.
- Add docs/guides/local-llm-scaling.md: four-phase local LLM scaling
architecture (single Ollama → LiteLLM Proxy → two models → multi-machine),
litellm-config.yaml, docker-compose snippet, monitoring and
troubleshooting sections.
- Add two new tests for per-usecase config: fallback and override paths.
* fix(ollama): disable think mode for non-streaming calls; forward all LOCAL_LLM_* env vars
Qwen 3.5 with Ollama's thinking mode enabled spends all available tokens
on chain-of-thought reasoning before writing content. With a small token
budget (e.g. max_tokens=128 in /hello), the model exhausts its budget
during thinking and returns empty content.
Fix 1 — add think parameter to _local_completion_payload:
- think=False (default): sends "think": false to Ollama so the model
outputs the answer directly into content, skipping CoT. Used by
call_local_completion (hello, one-shot) and call_local_with_tools
(agent turns) where latency matters more than reasoning depth.
- think=True: used only by _local_completion_stream (Phase 1A planning)
where CoT quality is the priority and tokens are plentiful.
Ignored by backends that do not recognise the field (vLLM, mlx, etc).
Fix 2 — forward all LOCAL_LLM_* env vars in docker-compose.yml:
Previously only USE_LOCAL_LLM and LOCAL_LLM_BASE_URL were forwarded,
so LOCAL_LLM_MODEL, LOCAL_LLM_COMPLETION_TOKEN_CEILING, the per-usecase
overrides, and all other local LLM config were invisible inside the
container. docker compose restart therefore never picked up .env changes
for those vars. Now all LOCAL_LLM_* fields are explicitly forwarded with
sensible defaults matching config.py.
* fix(ollama): raise token budget for non-streaming calls; warn on empty content
Qwen 3.5 with Ollama sends thinking in the reasoning field and the actual
answer in content. When max_tokens is too small (e.g. 128 in /hello),
the model exhausts its budget during chain-of-thought and content is
empty. think: false is sent in the payload for future Ollama support but
is currently ignored.
- Remove max_tokens=128 override in /hello endpoint; completion() already
defaults to max_tokens=4096 which gives the model room to think and
then answer
- Add warning log in _normalize_openai_message_content when content is
empty but reasoning is non-empty, pointing at the ceiling config var
- Update think param docstring to reflect current Ollama behaviour
* feat: hide A/B metrics panel via HTML comment (#984)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
---------
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* docs: comprehensive audit sweep — remove transient files, fix stale content (#1009)
Deletes:
- HACKATHON.md (session artifact, broken links, stale Python 3.11 ref)
- .agentception/verification-761.md (test run report, not a repo artifact)
- scripts/dispatch_issue_941.py (one-off script, docstring says so)
- agentception/EXTRACT.md (monorepo extraction is complete; procedure irreversible)
- docs/migration.md (migration to standalone DB is done; all checklist items past tense)
- agentception/README.md (described Cursor-only zero-LLM-calls arch that no longer exists)
Updates:
- .env.example: replace USE_LOCAL_LLM/mlx_lm.server/:8080 with LLM_PROVIDER/Ollama/:11434
- docs/guides/setup.md: Python 3.11→3.12, container agentception-app→agentception,
MLX→Ollama in local LLM blurb, drop USE_LOCAL_LLM legacy row from env var table
- docs/guides/security.md: port 8080→11434 in local provider example
- docs/reference/llm-contract.md: token ceiling default 4096→8192 (Ollama); note
that 4096 limit is specific to mlx-openai-server
- CHANGELOG.md: correct "AC_ prefix applied to all config keys" — only 4 vars use it
- docs/README.md: "all AC_* env vars" → "environment-variable-driven config"
- docs/cursor-agent-spawning.md: add accuracy note (primary dispatch path is now
agent_loop.py, not Cursor Task tool); fix broken links to missing stress-test file
- docs/guides/contributing.md: add callout distinguishing external-contributor
draft-PR workflow from internal merge-immediately policy (AGENTS.md)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* release: dev → main (#1033)
* docs: overhaul MCP reference with Mermaid diagrams and typed tool schemas (#1030)
Rewrite docs/reference/mcp.md as the single authoritative MCP reference:
- Add 4 Mermaid diagrams (architecture, tool dispatch, resource resolution, agent role surfaces)
- Add fully typed input/output schemas with example JSON-RPC for all 12 tools
- Generalize language from Cursor-specific to client-agnostic
Delete the mcp-tools.md stub (17 lines pointing to mcp.md).
Clean up docs/guides/mcp.md:
- Remove duplicate stdio config block (lines 77-97 were copy of 55-75)
- Generalize remaining Cursor-specific language (keep Cursor as example client)
- Fix stale tool name reference (build_spawn_child -> build_spawn_adhoc_child)
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* security: harden agent runtime — path sandbox, secret redaction, denylist, prompt injection guardrails, wall-clock timeout
Five concrete hardening measures from the MCP/agent runtime security audit:
1. File-tool path sandbox (_is_safe_read_path / _is_safe_write_path in
agent_loop.py): reads restricted to worktree + repo root; writes
restricted to worktree only. Symlinks resolved before check. All 10
file/directory tools enforced.
2. Shell command secret redaction (_redact_secrets in shell_tools.py):
all stdout/stderr from run_command is stripped of ANTHROPIC_API_KEY,
GITHUB_TOKEN, DATABASE_URL, AC_API_KEY, ghp_ PATs, sk-ant- keys, and
Bearer tokens before the output reaches the agent or the DB.
3. Expanded shell denylist: adds rm -rf /app, rm -rf /worktrees, nc -e,
/dev/tcp/, and /dev/udp/ to the existing _BLOCKED_PATTERNS.
4. Prompt injection security contract (_RUNTIME_ENV_NOTE in agent_loop.py):
every agent system prompt now includes an explicit instruction to treat
all repository content as untrusted external data, never to exfiltrate
credentials, and never to make unauthorized outbound HTTP requests.
5. Wall-clock timeout (asyncio.timeout in agent_loop.py): wraps the entire
agent loop with AGENT_MAX_WALL_SECONDS (default 7200 s / 2 h); timeout
cancels the run gracefully and transitions it to 'cancelled'.
Docker: adds no-new-privileges, cap_drop ALL, and minimal cap_add to
docker-compose.yml.
Docs: security.md rewritten to document all five hardening layers plus
the threat model update.
Tests: 131 pass (mypy clean, zero Any).
Co-authored-by: AgentCeption Bot <agent@agentception.io>
* security: non-root container user, narrow bind mounts, egress allowlist proxy (phase 2)
Three additional hardening measures completing the phase-2 security audit:
1. Non-root process user (Dockerfile + scripts/entrypoint.sh)
- Creates agentception user UID/GID 1001 in Dockerfile
- Installs gosu (purpose-built privilege-drop helper) alongside git/curl/ripgrep
- scripts/entrypoint.sh performs a two-phase startup:
* Root phase: write /etc/resolv.conf, compile SCSS/JS, run Alembic
migrations, chown /worktrees and model cache to agentception:agentception
* Unprivileged phase: exec gosu agentception "$@" → uvicorn runs as PID 1
with UID 1001, zero capabilities to acquire additional privs
- git system config adds user.email/user.name and http.proxy for worktree ops
- Verified: /proc/1/status shows Uid=1001 in the running container
2. Narrow bind mounts (docker-compose.yml)
- Replaces ./:/app with explicit per-directory mounts: agentception/,
.agentception/, pyproject.toml, org-presets.yaml, scripts/, tests/, tools/
- .env, docker-compose*.yml, .git/, Dockerfile, and all other sensitive or
unnecessary files are intentionally excluded
- Verified: /app/.env is absent inside the running container
- docker-compose.ci.yml updated with matching narrow mounts + no-op proxy
service for CI compatibility
3. Egress allowlist proxy (scripts/tinyproxy/)
- scripts/tinyproxy/Dockerfile.proxy: builds a tinyproxy …
Merge dev into main — resolve local-llm.md conflict (keep MLX removal)
* fix: ab-metrics SQL CAST and poller DNS backoff (#1056) AB metrics: - Replace ':days::integer' with 'CAST(:days AS integer)' in the textual SQL query — asyncpg misparses ':param::type' as a malformed named parameter and raises PostgresSyntaxError on every page load. - Regression test: test_ab_metrics_sql_uses_cast_not_double_colon. Poller: - Add dedicated OSError handler in polling_loop() — covers socket.gaierror (DNS, [Errno -2] Name or service not known), ConnectionRefusedError, and other network-level failures. - Backs off exponentially (30 s → 60 s → 120 s, cap 300 s) instead of retrying at the normal poll interval and flooding logs. - Regression tests: test_polling_loop_oserror_triggers_backoff, test_polling_loop_oserror_backoff_doubles_on_repeat. Co-authored-by: AgentCeption Bot <agent@agentception.io> * fix: move lone-worker warning out of header row into full-width strip (#1058) The warning text was crammed inside the .od-header flex row alongside all the other buttons, stretching the row and breaking the layout. - Pull loneWorkerWarning out of od-header__launch-group entirely. - Add a dedicated .od-warn-strip element rendered just below the header (still inside the OD panel flex column) with amber colouring. - Launch button stays in the header row unobstructed. - Strip strips the leading emoji from the JS string to avoid duplication with the icon span. - New SCSS: .od-warn-strip with icon/text sub-elements. Co-authored-by: AgentCeption Bot <agent@agentception.io> * chore: rename service port from 10003 to 1337 Updates all 40 files that referenced port 10003 — Docker images, compose files, CI workflow, app entrypoint, scripts, all docs, tests, and derived .agentception role prompts — to use port 1337. Verified zero remaining 10003 references, clean mypy, no template drift, and health check passes on the new port. Co-authored-by: AgentCeption Bot <agent@agentception.io> --------- Co-authored-by: AgentCeption Bot <agent@agentception.io>
* fix: ab-metrics SQL CAST and poller DNS backoff (#1056) AB metrics: - Replace ':days::integer' with 'CAST(:days AS integer)' in the textual SQL query — asyncpg misparses ':param::type' as a malformed named parameter and raises PostgresSyntaxError on every page load. - Regression test: test_ab_metrics_sql_uses_cast_not_double_colon. Poller: - Add dedicated OSError handler in polling_loop() — covers socket.gaierror (DNS, [Errno -2] Name or service not known), ConnectionRefusedError, and other network-level failures. - Backs off exponentially (30 s → 60 s → 120 s, cap 300 s) instead of retrying at the normal poll interval and flooding logs. - Regression tests: test_polling_loop_oserror_triggers_backoff, test_polling_loop_oserror_backoff_doubles_on_repeat. Co-authored-by: AgentCeption Bot <agent@agentception.io> * fix: move lone-worker warning out of header row into full-width strip (#1058) The warning text was crammed inside the .od-header flex row alongside all the other buttons, stretching the row and breaking the layout. - Pull loneWorkerWarning out of od-header__launch-group entirely. - Add a dedicated .od-warn-strip element rendered just below the header (still inside the OD panel flex column) with amber colouring. - Launch button stays in the header row unobstructed. - Strip strips the leading emoji from the JS string to avoid duplication with the icon span. - New SCSS: .od-warn-strip with icon/text sub-elements. Co-authored-by: AgentCeption Bot <agent@agentception.io> * chore: rename service port from 10003 to 1337 Updates all 40 files that referenced port 10003 — Docker images, compose files, CI workflow, app entrypoint, scripts, all docs, tests, and derived .agentception role prompts — to use port 1337. Verified zero remaining 10003 references, clean mypy, no template drift, and health check passes on the new port. Co-authored-by: AgentCeption Bot <agent@agentception.io> * feat: add solo-ticket scope to Org Designer Adds a "Specific Issue" scope option to the root node editor panel so a lone worker can be pinned to a single GitHub issue number without needing a coordinator above it. - OrgNode.scope extended to 'full_initiative' | 'phase' | 'issue' - New scopeIssueNumber field on OrgNode / OrgNodePayload - isRootSelected computed getter — scope picker only shown for root - loneWorkerWarning suppressed when scope === 'issue' - launch() passes scope_issue_number to already-supported backend field - Edit panel gains a Scope radio (Full Initiative / Specific Issue) with a number input that appears when Specific Issue is selected - SCSS: od-scope-radio / od-scope-opt / od-editor__input styles Co-authored-by: AgentCeption Bot <agent@agentception.io> * feat: show issue # badge on node card when scope === 'issue' When a node is scoped to a specific issue, the node card now displays a clickable issue number badge (e.g. "# 1048") that links directly to the GitHub issue in a new tab. The badge uses the same emerald tint as the worker node border, is visible both before and after launch, and clicking it does not trigger node selection (stopPropagation). - nodeCardHtml gains a scopeBadge() helper covering both phase and issue scopes - renderD3 / nodeCardHtml accept repo string to build the GitHub URL - _render() passes this.repo through - New od-node__issue-link SCSS pill style (emerald, hover state) Co-authored-by: AgentCeption Bot <agent@agentception.io> * fix: repurpose A/B metrics widget as Developer Run Health The widget was showing a single 'control' bucket with no treatment group, making the A/B framing meaningless. Repurposed to a clear health dashboard: - Drop the 'Variant' column (always 'control', adds no signal) - Add 'Developer Run Health' heading with lookback window subtitle - Add Passed / Failed columns alongside pass rate percentage - Color-code pass rate: green >= 50%, amber > 0%, red = 0% - Format avg_input_tokens as e.g. '1.52M' instead of raw integer - Update loading placeholder and HTML comment in build.html - Update route module docstring - SCSS: dev-health-panel heading, dev-health__good/warn/bad color classes Co-authored-by: AgentCeption Bot <agent@agentception.io> * feat: Developer Run Health mission-control panel Redesigns the bottom panel from a bare table into a full health dashboard matching the Mission Control visual language. SQL / data: - Adds avg_output_tokens, avg_cache_read/write_tokens to aggregates - Adds avg_duration_secs (completed_at - spawned_at) per run - Adds retry_count (attempt_number > 0) - Adds per-grade counts (A/B/C/D/F) via a reviewer-run join on issue_number — the old query read developer done-events which never carry grades, so pass_rate was always 0%; it now correctly surfaces the reviewer verdict - Computes estimated_cost_per_run in Python using Anthropic token pricing - Computes retry_rate = retry_count / runs Visual (drh BEM block, _foundation.scss tokens throughout): - 7-tile KPI grid: Runs / Pass Rate / Avg Duration / Est. Cost / Avg Iters / Retry Rate / Avg Tokens — each tile uses bg-elevated + border-subtle - Semantic modifiers (--good/--warn/--bad) apply gradient tints + color - Grade distribution: A–F progress bars with letter, count, and % columns — A/B green, C amber, D orange, F danger-red - Panel header: ⚡ icon, title badge, run count pushed to right - Replaced legacy --surface-2/--muted with --bg-elevated/--text-muted Co-authored-by: AgentCeption Bot <agent@agentception.io> * fix: persist FastEmbed ONNX model cache across container rebuilds FASTEMBED_CACHE_DIR was documented in a config comment but never implemented as a Settings field, so the env var was a no-op and FastEmbed always defaulted to /tmp/fastembed_cache — wiped on every container rebuild. - Add fastembed_cache_dir setting (default /home/agentception/.cache/fastembed) - Pass cache_dir to TextEmbedding, SparseTextEmbedding, TextCrossEncoder so all three ONNX models (~900 MB total) land in the named volume - Add agentception-fastembed-cache named volume in docker-compose.yml mounted at /home/agentception/.cache/fastembed - entrypoint.sh: mkdir + chown the fastembed cache before dropping privs * feat: scope cascade filter Label → Phase → Ticket in org designer Extends the scope picker from 2 options to a 3-level cascade. Each level is a radio option; selecting Phase or Ticket shows a dropdown populated from the initiative's context data. Blocked phases and tickets (those where all issues carry the blocked/deps label) are shown grayed out and disabled so users can't dispatch against work that's stuck. Backend: - Add blocked: bool to PhaseSummary and IssueSummary TypedDicts - get_label_context() now tracks per-phase blocked counts and per-issue blocked flag from the blocked/deps label - PhaseSummaryItem and IssueSummaryItem Pydantic models expose blocked so the field is serialised to the frontend Frontend: - Add IssueItem interface with blocked field; extend PhaseItem - Store issues list in component state (was discarded by _loadPhases) - _loadPhases() now stores both phases and issues from context API - loneWorkerWarning suppressed for scope === 'phase' in addition to 'issue' - 3-option scope radio: Label / Phase / Ticket - Phase scope: select dropdown, blocked options disabled + (blocked) suffix - Ticket scope: select dropdown, blocked options disabled + (blocked) suffix - New .od-scope-select SCSS with dark-theme styling and blocked option muting * fix: OrgNodeSpec missing 'issue' scope + readable 422 error messages OrgNodeSpec.scope was Literal["full_initiative", "phase"] — it did not include "issue", so dispatching with scope=issue sent by the new Ticket scope picker caused Pydantic to reject the request with a 422. The frontend then displayed "[object Object]" because FastAPI's 422 detail is an array of validation error objects, not a string. - Add "issue" to OrgNodeSpec.scope literal - Add scope_issue_number: int | None = None to OrgNodeSpec - Add FastApiValidationError interface to org_designer.ts - DispatchError.detail typed as string | FastApiValidationError[] - launch() error handler formats validation arrays as "msg1; msg2" instead of stringifying the raw object * fix: mount .git into container so dispatch can run git commands The dispatch pipeline runs git commands with cwd=/app (settings.repo_dir) inside the container. The narrow bind-mount strategy excluded .git, so every git call — _resolve_dev_sha(), ensure_worktree(), list_git_worktrees() — failed with "fatal: not a git repository". Add .git as a writable bind mount so worktree dispatch works end-to-end. Also fix 6 tests in test_ensure_helpers.py that patched asyncio.create_task with return_value=asyncio.Future(). The AsyncMock coroutines passed to the mock were never closed, emitting RuntimeWarning "coroutine never awaited" at teardown. Replace with a side_effect helper that closes the incoming coroutine before returning the future. Co-authored-by: AgentCeption Bot <agent@agentception.io> * fix: resolve test failure and AsyncMock coroutine warnings across test suite KeyError fix: test_label_context_returns_phases_and_issues was patching get_label_context with mock data missing the 'blocked' field, which the route now reads. Added blocked: False to both phases and issues in the mock and asserted on the field in the response. Coroutine warning fix: add make_create_task_side_effect() to conftest.py and use it wherever asyncio.create_task is mocked with a plain MagicMock. When AsyncMock-patched functions (run_agent_loop, auto_dispatch_reviewer, teardown_agent_worktree, auto_redispatch_after_rejection) are called and their resulting coroutine is passed to a mocked create_task, that coroutine is never closed, producing RuntimeWarning at GC time. The helper closes the coroutine immediately and returns a resolved Future. Files updated: conftest.py, test_label_context_and_dispatch.py, test_dispatch_variant.py, test_ensure_helpers.py, test_build_commands.py, test_build_commands_rebase.py. Verified: 82 passed, 0 warnings with -W error::RuntimeWarning. Co-authored-by: AgentCeption Bot <agent@agentception.io> * feat: add build_block_run, build_resume_run, build_stop_run MCP tools (#1079) Implements the three state-transition tools that were already exercised by test_mcp_build_commands_pr3.py but had no corresponding production code: - build_block_run: implementing → blocked (reversible via build_resume_run) - build_resume_run: blocked/stopped → implementing (idempotent restart-safe) - build_stop_run: any active → stopped (non-terminal, resumable) Also restores persist.py to committed state — the local dirty version had removed the stall guard, cross-repo collision fix, and JsonValue types. Fixes all 13 failures and 2 RuntimeWarning leaks in test_mcp_build_commands_pr3.py. Co-authored-by: AgentCeption Bot <agent@agentception.io> * fix: eliminate last two RuntimeWarning and DeprecationWarning sources (#1080) - test_build_commands_rebase.py: add side_effect=make_create_task_side_effect() to the bare patch("...asyncio.create_task") in test_rebase_succeeds_with_empty_worktree_path_dict; the plain MagicMock left auto_dispatch_reviewer coroutines unclosed, triggering RuntimeWarning during GC in later tests. - agent_loop.py:2297: replace datetime.datetime.utcnow() with datetime.datetime.now(datetime.UTC) to silence the DeprecationWarning that surfaced in 7 test_agent_loop.py tests. Co-authored-by: AgentCeption Bot <agent@agentception.io> * fix: close last two bare create_task coroutine leaks in rebase tests (#1081) test_rebase_conflict_returns_error_and_aborts and test_no_worktree_path_skips_rebase_and_dispatches_reviewer both patched asyncio.create_task as a plain MagicMock; the auto_dispatch_reviewer AsyncMock coroutine passed to create_task was never closed, triggering a RuntimeWarning during GC in later tests (test_build_initiative_tabs). Added side_effect=make_create_task_side_effect() to both patches. Full suite: 2074 passed, 0 warnings. Co-authored-by: AgentCeption Bot <agent@agentception.io> * fix: add git safe.directory for /app and /worktrees in entrypoint (#1082) The .git bind-mount is owned by the macOS host user; the container process runs as agentception (UID 1001) which has a different UID. Git 2.35.2+ treats this as dubious ownership and refuses to run, causing the "git rev-parse origin/dev failed: fatal: detected dubious ownership" error on every agent dispatch. Add /app and /worktrees to the system git config (written to /etc/gitconfig) during the privileged entrypoint phase so both root (build steps) and the unprivileged agentception user (after gosu drop) can run git against the mounted repository. Co-authored-by: AgentCeption Bot <agent@agentception.io> * fix: correct LiveOrgNode data access in renderLiveD3; redesign inspector as Cursor-like terminal log (#1084) JS bug: D3 hierarchy<LiveOrgNode> makes d.data a LiveOrgNode, not a RunTreeNodeRow; accessing d.data.id/.role threw TypeError everywhere in the live org tree callbacks. Fixed by accessing (d.data as LiveOrgNode).data at all five call sites (link key, card key, classed×2, html). Inspector redesign — Cursor-like terminal log: - height: calc(100vh-80px) replaces max-height so __content flex children actually scroll instead of compressing - min-height: 0 through __content → __activity so overflow-y triggers - #activity-feed: dark terminal bg (#09090f), monospace, styled scrollbar - Activity rows: 3-column grid (icon | summary | ts), 22px compact lines, color-coded 2px left borders (purple=LLM, green=file, orange=tool, etc.) - Event cards restyled for dark context; step_start → subtle section divider (border-top + uppercase label) instead of a heavy ► card - Stop button and stream-status bar get matching dark bg + red-tinted stop Co-authored-by: AgentCeption Bot <agent@agentception.io> * fix: add missing columns to A/B metrics test mocks (#1086) The _AB_QUERY SQL returns retry_count, avg_output_tokens, avg_cache_read_tokens, avg_cache_write_tokens, avg_duration_secs, and grade_a through grade_f — but the test mocks only supplied a subset of columns, causing KeyError: 'retry_count' in CI. Also updates the HTMX assertion from the old <table class="ab-metrics"> to the current <div class="drh"> template structure. Co-authored-by: AgentCeption Bot <agent@agentception.io> * fix: replace banned `object` param type in conftest create_task helper (#1087) The typing audit flags `**_: object` as a param_object violation. Replace with explicit `name: str | None` keyword arg matching asyncio.create_task's actual signature. Co-authored-by: AgentCeption Bot <agent@agentception.io> --------- Co-authored-by: AgentCeption Bot <agent@agentception.io>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
origin/mainintodevto resolve the merge conflict in_inspector-layout.scssthat was blocking PR release: merge dev into main #1093 (dev → main)Test plan