Conversation
chore: sync dev with main merge commit (fast-forward)
.build-issue__branch already had white-space: nowrap + overflow: hidden + text-overflow: ellipsis + max-width: 100%, but the truncation never fired. Root cause: .build-issue__pr-row had align-items: flex-start inherited from its flex-column parent (.build-issue__footer), which shrinks the row to its content width. max-width: 100% then resolves against that already-oversized parent and does nothing. Fix: add width: 100% to .build-issue__pr-row so the percentage resolves against the card width, not the branch name's intrinsic length.
fix: branch name truncates to card width (not hardcoded length)
… local model name
Three cosmetic/correctness fixes to the llm_iter row in the activity feed:
1. Remove color-coded left borders from all non-error activity rows.
The border-left-color rules by subtype added visual noise without meaning.
Only error rows (border-left: #ef4444) and shell exit-nonzero rows keep a
coloured border because those carry real signal.
2. Drop the 'Iteration N' second line from llm_iter rows.
The step count already establishes position; the iteration number inside a
step is redundant. buildIterSummary now renders a single .af__iter-model
span. Matching SCSS removes the flex-direction: column layout and the
now-unused .af__iter-num selector.
3. Fix 'Local: local' — show the actual Ollama model name.
Two root causes:
a) Backend (llm.py): local_llm_tool_call emitted llm_iter with 'model: model'
(the default parameter 'local') instead of 'model: agent_model or model'
(the resolved settings value, e.g. 'qwen2.5:7b').
b) Frontend (format_utils.ts): parseModelInfo had no handling for Qwen,
Llama, Mistral, Phi, DeepSeek, or Gemma model strings from Ollama.
Now: 'qwen2.5:7b' → { network: 'Local', modelShort: 'Qwen 2.5' } →
display 'Local: Qwen 2.5'. 'local' (fallback) → modelShort '' → display
just 'Local' (no redundant ': local' suffix).
fix: activity feed — remove color borders, drop iteration count, fix local model name
…in Docker Tool rows (tool_invoked, github_tool) now expand on click to show the full parsed arguments inline below the row: - 4-column grid adds a chevron (af__chevron) that rotates 90° when open - Click (or Enter/Space) toggles aria-expanded and the hidden detail panel - af__tool-detail shows each arg as key · value using af__detail-key / af__detail-val; falls back to raw preview if parsing fails - Keyboard-accessible: role=button, tabindex=0 - Light and dark mode styled - 5 new unit tests covering expand, collapse, key-value rendering, and non-expandable row types Also fixes a long-standing Docker gap: vitest.config.ts was never copied into the image (only package.json, tsconfig.json etc. were). Without the config Vitest used default include patterns (**/*.spec.ts) which picked up the Playwright E2E spec and poisoned the jsdom environment for all other tests. Fix: add vitest.config.ts to the Dockerfile COPY step. The config also gains an explicit exclude for agentception/tests/** to prevent future E2E spec pickup regardless of Vitest version changes.
feat: expandable tool call rows in activity feed
- llm_iter rows are suppressed entirely; instead a single .af__model-header element is inserted once at the top of #activity-feed showing the run model (e.g. "Local: Qwen 3.5") so it is not repeated on every step - llm_usage rows are suppressed; their token count is injected directly into the .event-card__tokens placeholder on the current step header, appearing right-aligned on the same line as "STEP N" - Removed per-subtype indentation from tool/file/shell/git_push rows — they are now flush with the rest of the feed since llm_iter is no longer a visual parent - step_context: track _currentStepHeader; export getCurrentStepHeader() - event_card: add empty .event-card__tokens span to step_start cards - SCSS: .af__model-header styles, .event-card__tokens styles, pruned dead llm_iter/llm_usage overrides, updated .af__tool-detail padding - 250 tests passing, zero type errors
feat: model info in feed header; token count on step header; remove tool indent
fix: colon separator in tool summary rows
fix: sans-serif colon separator; 0:00 timestamp instead of 'now'
…e entry - Add humanizeDetailKey() to format_utils with a 40-key label map (n_results→results, cmd_preview→command, old_string→find, etc.) - buildToolDetail uses humanizeDetailKey for all rendered keys - start_line + end_line are collapsed into a single "lines: N–M" entry instead of two separate rows - 260 tests passing, zero type errors
feat: humanize detail panel key names; collapse line range
Backend: - FileReadPayload gains content_preview: NotRequired[str] - read_file_lines emits first 10 lines / 400 chars as content_preview Frontend: - file_read rows are now expandable (click chevron to reveal preview) - buildFileReadDetail renders content_preview in a scrollable <pre> block - Falls back to a "no preview" note for historical events without the field - .af__content-preview SCSS: compact code block, max-height 10rem, scrollable - 263 tests passing, zero mypy errors, zero tsc errors
feat: file content preview in file_read activity events
Backend: - DirListedPayload TypedDict: path, entry_count, entries (newline-delimited) - dir_listed added to ACTIVITY_SUBTYPES - agent_loop: emit dir_listed after successful list_directory with proper isinstance narrowing for JsonValue typing Frontend: - folder SVG icon added to icons.ts - dir_listed: expandable row (folder icon, "N entries" summary) - buildDirListedDetail renders entries as a pre block via af__content-preview (same SCSS reused from file_read content preview) - 267 tests passing, zero mypy errors, zero tsc errors
feat: dir_listed activity event — show list_directory results in inspector
- Increase arg_preview 120→500 chars and text_preview 200→1500 chars so full LLM reply text and complete tool args fit without truncation. - Make llm_reply rows expandable: clicking reveals the full text_preview in a scrollable prose block (af__content-preview--reply). - Suppress the internal `collection` key (Qdrant collection name) from all tool detail panels via HIDDEN_DETAIL_KEYS. - Render old_string/new_string pairs as a colour-coded find/replace diff block (af__diff-block--old / --new) instead of raw key-value lines. - SCSS: add diff block styles with red/green tints for find/replace, and a wrapping prose variant for LLM reply previews. - Tests: 8 new assertions covering llm_reply expand, diff rendering, and collection suppression — 271 tests passing.
feat(feed): llm reply expandable, search humanisation, code diff display
Adds scripts/retrofit_activity_events.py — idempotent back-fill for two event types added after existing runs were recorded: - dir_listed: reconstructs 29 events across 3 runs from tool result rows in agent_messages (entries list stored in raw JSON). - llm_reply text_preview: extends 2 events whose previews were truncated at 200 chars to the new 1 500-char limit using the matching assistant message in agent_messages. Safe to re-run; no-ops when there is nothing left to do. Usage: docker compose exec agentception python3 /app/scripts/retrofit_activity_events.py [--dry-run]
feat(scripts): retrofit historical activity events
dir_listed is a result, not a peer event. Instead of rendering a separate folder-icon row, inject directory entries directly into the detail panel of the preceding list_directory tool_invoked row: - Remove dir_listed from EXPANDABLE_SUBTYPES. - Add injectDirListedIntoPanel(): finds the nearest [data-list-dir-target] panel in the current step container and appends an 'entries' key row + pre block. - Tag list_directory tool_invoked detail panels with data-list-dir-target so the injector can find them. - Route dir_listed events to the injector then return (no standalone row). - Tests: replaced 4 dir_listed-as-row tests with 3 nesting-behaviour tests. 270 tests passing.
feat(feed): nest dir_listed entries inside list_directory row
Same pattern as dir_listed / list_directory: Backend: - Add SearchResultsPayload TypedDict and 'search_results' to ACTIVITY_SUBTYPES. - After search_codebase: emit search_results with deduplicated relative file paths from match objects. - After search_text: emit search_results by parsing rg --heading output to extract file path lines. Frontend: - Tag search_codebase / search_text tool_invoked detail panels with data-search-target. - Route search_results events to injectSearchResultsIntoPanel(), which finds the nearest data-search-target panel and appends a 'files' label + pre block. - No standalone row created — results live inside the search row chevron. - Tests: 4 new assertions; 274 tests passing.
feat(feed): show search result files nested inside search rows
Adds _backfill_search_results() to retrofit_activity_events.py: - Finds runs with search_codebase/search_text tool_invoked events but no search_results activity events. - Reconstructs file lists from agent_messages tool results: structured matches array for search_codebase, rg --heading output parsing for search_text. - Matches invocations to results sequentially (same strategy as dir_listed). Ran live: 61 search_results events inserted across 3 historical runs. Script remains fully idempotent.
feat(scripts): retrofit search_results for historical runs
Three fixes: 1. Injection scope: search the full feed element (not just the current step container) when injecting dir_listed and search_results. This makes injection resilient to step-boundary timing edge cases where the step context changes between the tool_invoked and result events. 2. Label: n_results → 'limit' in ARG_KEY_LABELS. The param is the request cap, not the count of matches found. Showing 'results: 10' implied 10 results were found rather than requested. 3. Retrofit timestamps: search_results events now anchor to invocation.recorded_at + 1 s (not the tool result timestamp) to guarantee they sort strictly after their tool_invoked event. The script purges and re-inserts to fix the previously stored events. 274 tests passing.
Named volumes shadow image layer content. The agentception-fastembed-cache volume was created when models were being downloaded to the wrong path (before the HOME fix). Docker never re-initialises a non-empty volume from image content, so the stale empty volume permanently blocked the correctly-baked models from being visible at runtime. Fix: remove the named volume entirely. The fastembed ONNX models are baked into the image layer at build time (Dockerfile RUN step before COPY agentception/). Using the image layer directly is simpler and more reliable — no stale-volume problem class possible. Also remove the fastembed chown block from entrypoint.sh (no longer needed; the image layer has correct ownership from the build step). Verified: both jinaai/jina-embeddings-v2-base-code and BAAI/bge-reranker-base load from the image layer in ~8s and ~20s respectively with no download and no ONNX errors.
fix: remove fastembed_cache named volume — use image layer directly
The recon warning was opaque — 'could not parse plan from LLM response' with no indication of what the model actually returned. Now logs the first 500 chars of the raw response so the root cause (empty string, plain text, malformed JSON, etc.) is immediately visible.
…ey for Ollama 0.18
Two fixes:
1. agent_loop.py: when recon JSON parse fails, log the first 500 chars
of the raw LLM response so the root cause is immediately visible in
logs rather than requiring a separate debug session.
2. llm.py: _normalize_openai_message_content checked message['reasoning']
(Ollama ≤0.17) but Ollama 0.18+ uses message['reasoning_content'].
When the key was wrong, the token-budget-exhausted warning was silently
swallowed, completion() returned '', _parse_recon_json('') returned None,
and the only visible symptom was the opaque 'could not parse plan'
warning. Now checks both keys.
fix: log raw recon response on parse failure; fix reasoning_content key for Ollama 0.18
) Reviewer agents intentionally return {"files":[],"searches":[],...} when they don't need to pre-load source files before starting their loop. The previous guard `if not files and not searches: return None` treated this as a parse failure, emitting a spurious warning and aborting the recon phase entirely. Remove the guard. An empty-lists plan is a valid signal: the recon phase becomes a no-op and the agent proceeds immediately with existing context (e.g. the PR diff already in the prompt). Regression test added: test_parse_recon_json_empty_lists_valid confirms both empty-with-plan and empty-without-plan are accepted. Co-authored-by: AgentCeption Bot <agent@agentception.io>
#1164) Silent JSONDecodeError swallowing made it impossible to diagnose why _parse_recon_json was returning None. Now logs: - the decode error position and message - up to 300 chars of the extracted JSON substring that failed - the type if the parsed value is unexpectedly not a dict Also expands the outer warning's raw dump from 500 → 2000 chars so the full LLM response is visible in logs when the parse fails, rather than being truncated mid-plan-string. Co-authored-by: AgentCeption Bot <agent@agentception.io>
#1165) The model was treating the recon turn as a regular agent tool loop, emitting read_file(...) bash-fence syntax instead of the required JSON plan. The original "Output ONLY the JSON object, nothing else" rule was not strong enough to override the tool-use patterns in the main system prompt. New addendum explicitly states: - "THIS IS NOT A TOOL-USE TURN" in all-caps - Lists all forbidden output forms (tool calls, prose, non-json fences) - Lists the single required form (the json fence with the plan schema) - Explains why tool calls cannot work (no dispatcher in this turn) Co-authored-by: AgentCeption Bot <agent@agentception.io>
The FORBIDDEN list and all-caps warnings were noise — the format example is the constraint. Replace 28 lines with 8: - one "not in a tool loop" line (replaces the FORBIDDEN list) - the json example (unchanged — this is what models actually follow) - the two constraint lines (≤8 files, ≤5 searches, omit pre-loaded) Co-authored-by: AgentCeption Bot <agent@agentception.io>
#1167) With think=True each tool-call iteration was burning 30k+ tokens on internal reasoning before writing a single tool call, exhausting the entire 32k ceiling and producing: - empty content (stop_reason=tool_calls, 0 actual calls) - "token budget exhausted during chain-of-thought" warning The agent loop is already an iterative reasoning mechanism — it builds understanding turn-by-turn through tool calls. Per-iteration CoT adds latency and budget pressure without improving tool-selection quality. think=False lets the model act immediately on each turn. If a specific high-stakes call needs CoT in future, it can opt in at the call site. Co-authored-by: AgentCeption Bot <agent@agentception.io>
…1168) Root cause: fastembed defaults its cache to tempfile.gettempdir() = /tmp/fastembed_cache, which Docker wipes on every container restart. The Dockerfile pre-download used HOME=/home/agentception (irrelevant — fastembed never reads HOME) so models landed in /tmp during the build and were either lost or never reachable by the app which explicitly passes cache_dir=/home/agentception/.cache/fastembed. Fix: - Named Docker volume agentception-fastembed-cache mounted at /home/agentception/.cache/fastembed — the exact path code_indexer already passes as cache_dir. Volume survives every restart and rebuild. - FASTEMBED_CACHE_PATH env var set to match, so the entrypoint's startup check and any future manual downloads use the same path. - entrypoint.sh verifies both ONNX files at startup; re-downloads only if missing (self-healing for fresh installs or volume corruption). - Dockerfile pre-download step removed: it was always writing to /tmp, never to the path the app reads from, and was the source of repeated 600 MB re-downloads. Builds are now fast and deterministic. - Fix pre-existing test breakage: _role_base_fallback and _load_role_prompt tests updated to import from role_loader (where they moved in PR #1148). Co-authored-by: AgentCeption Bot <agent@agentception.io>
A 404 response from Qdrant means the collection for this worktree does not exist yet — it gets created when the agent indexes the codebase, but the recon phase searches before any indexing has happened. This is expected for every new agent run and should not appear as a WARNING. Demote to INFO with a clear message so operators know it is normal. Real failures (non-404 exceptions) still log at WARNING. Co-authored-by: AgentCeption Bot <agent@agentception.io>
think=true/false is silently ignored by Ollama's OpenAI-compatible /v1/chat/completions endpoint — all three of think=None, think=True, think=False produce identical output (145 completion tokens, 3.7s). reasoning_effort="none" is the correct parameter and actually works: 2 tokens, 0.2s for a simple response; 31 tokens, 1.8s for a tool call. At agent scale (10 rounds, 29k context) default thinking burned 4850 completion tokens and took 175 s per turn; with reasoning_effort="none" the same turn costs 173 tokens in 10.6s — 28x fewer tokens, 16x faster. Changes: - call_local_llm_with_tools: "think": False → "reasoning_effort": "none" - _local_completion_payload: think: bool param → reasoning_effort: str param (default "none" for recon/summaries; streaming planner uses "medium") - _normalize_openai_message_content: updated warning message to reflect that exhaustion is now unexpected and points at the correct fix
fix: replace dead think=false with reasoning_effort=none for Ollama
Adds a comprehensive 'Thinking mode and reasoning_effort' section to docs/guides/local-llm.md covering: - Why think=true/false is silently ignored on /v1/chat/completions - reasoning_effort="none" as the correct parameter (Ollama 0.18+) - Measured token/time data: 4850 tokens/175s → 173 tokens/10.6s at 29k ctx - Parameter reference table for all reasoning_effort values - Which code paths use which setting and why - Verification curl command - CLI usage for interactive sessions - Ollama version requirement and upgrade instructions Also updates outdated config defaults throughout the doc: LOCAL_LLM_COMPLETION_TOKEN_CEILING 8192→32768, LOCAL_LLM_MAX_CONTEXT_CHARS 24000→60000, LOCAL_LLM_MAX_SYSTEM_CHARS 12000→20000.
…docs docs: document reasoning_effort=none for Ollama thinking mode control
…vars - LOCAL_LLM_COMPLETION_TOKEN_CEILING: 16384 → 32768 (matches config.py/docker-compose) - Update reasoning token comment: reasoning_effort=none eliminates CoT tokens entirely - Fix example model: qwen3:30b-a3b → qwen3.5:35b-a3b-q4_K_M (current recommended) - Add USE_LOCAL_LLM documentation (was in docker-compose but missing from example) - Add HOST_REPO_DIR documentation - Add multi-model override vars (LOCAL_LLM_BASE_URL_PLAN, MODEL_PLAN, BASE_URL_AGENT, MODEL_AGENT) - Rewrite section headers and comments for clarity throughout - Group WORKTREE_INDEX_ENABLED and AC_MIN_TURN_DELAY_SECS with better descriptions Also fix .env (local, gitignored): - LOCAL_LLM_MAX_CONTEXT_CHARS: 24000 → 60000 (matches docker-compose default) - Remove stale mlx-openai-server comment
fix: sync .env.example with current config defaults and document all vars
The reviewer's initial message already contains the full briefing header (OWNER, REPO, BRANCH, PR_NUMBER, ISSUE_NUMBER) plus the pre-loaded review context (git diff, mypy, pytest, issue). Instructing the agent to call task/briefing returns the exact same content and wastes one full turn. Replace 'All task context is in your task/briefing MCP prompt. Read it once' with an explicit instruction NOT to call any MCP prompt tool, and point the agent to the briefing header fields already visible in the initial message.
…-call fix: remove task/briefing MCP call from reviewer role prompt
…wer prohibition
The _RUNTIME_ENV_NOTE was telling every agent — including the reviewer — to
read ac://runs/{run_id}/context for full task context. This directive belongs
at the briefing layer (where the developer gets it when the issue body is not
directly injected), not in the runtime environment note which is about
Docker/git/tools mechanics.
Removing it from _RUNTIME_ENV_NOTE is the correct architectural fix: the
developer still gets directed to the resource via the briefing itself when
needed. The reviewer never needs it — the warmup pre-loads diff, mypy, pytest,
and the issue into messages[0] before the loop starts.
Also removes the now-unnecessary "do not call task/briefing" prohibition from
the reviewer role template. The prohibition was fighting the old instruction
that said to call it; with both the instruction and the runtime directive gone,
there is no ghost left to fight. Less prompt = less confusion.
fix: remove ac://runs context directive from runtime note; trim reviewer prohibition
The org chart's Launch button always called POST /api/dispatch/label.
That endpoint created a fresh worktree from dev and set task.branch to
the throw-away label branch — not the implementer's PR branch — so the
reviewer warmup computed a zero-diff against dev and the agent started
with no context.
Comprehensive fix across three layers:
1. agentception/readers/github.py — add find_pr_for_issue(issue_number,
repo) that locates the open PR by branch-name pattern (agent/issue-N)
first, then by closing keywords (closes/fixes/resolves #N) in the PR
body. Returns None on API failure rather than raising.
2. agentception/routes/api/dispatch.py — detect role=="reviewer" +
scope=="issue" early in dispatch_label_agent and delegate to the new
_dispatch_label_reviewer helper. The helper:
- looks up the PR via find_pr_for_issue (422 if none found)
- fetches the remote PR branch (422 if branch deleted)
- creates the worktree anchored to origin/<pr_branch> (409 if exists)
- sets run_id=review-{pr_number}, branch=<pr_branch> in the DB so
agent_loop's _run_reviewer_warmup checks out the right code
- passes pr_number to persist_agent_run_dispatch
All other roles/scopes continue through the existing label path.
3. agentception/services/agent_loop.py — fix the reviewer warmup fallback
from the deprecated feat/issue-{N} to agent/issue-{N}, and emit a
structured WARNING when task.branch is missing so mis-configured
dispatches are immediately visible in logs.
12 new tests cover every outcome: PR found by branch, PR found by body
keyword, all closing-keyword variants, branch-over-body priority, no PR
found, branch deleted, worktree exists, developer routes unchanged, and
the warmup fallback warning.
fix: route reviewer org-chart dispatch to PR branch, not dev
The UI slug 'pr-reviewer' never matched any backend check (role == 'reviewer'), so reviewer agents were dispatched as plain label workers with no warmup, no reviewer tool surface, and no PR branch anchoring. Renames every occurrence — the role is reviewer, not pr-reviewer: - org_designer.ts: slug, qa-coordinator rules, WORKER_RULES - agent.html: org-node emoji/label conditionals - transcripts.html, transcript_detail.html: badge colour maps Rebuilds app.js bundle.
…iewer fix: rename pr-reviewer → reviewer everywhere
Co-authored-by: AgentCeption Bot <agent@agentception.io>
Co-authored-by: AgentCeption Bot <agent@agentception.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Merges all recent fixes and features from dev into main: