Add direct-API and codex review runners with live logging by sergeyfast · Pull Request #19 · vmkteam/reviewer

sergeyfast · 2026-06-13T19:10:44Z

Summary

Add direct-API runner (--runner direct): drives an LLM through the Messages/Chat API directly (no CLI) with a narrow review tool set (read/grep/glob/git_diff/ast_*) plus a submit_review tool. New pkg/reviewer/direct package — agent loop with parallel tool dispatch (loop.go), diff + changed-file preload (preload.go), per-round usage/cost accounting, mid-run compaction, and a submit_review that always produces a filled review.json (no Step-2 retry needed).
Add two providers behind a LLMProvider interface, selected by --api-provider: native Anthropic (provider_anthropic.go, streaming via Messages.NewStreaming + Accumulate, rolling prompt-cache breakpoint, adaptive thinking preserved through Raw) and OpenAI-compatible (provider_openai.go, covers deepseek and openai-compat). Pricing/effort resolved in provider_factory.go, incl. DeepSeek V4 (deepseek-v4-pro/deepseek-v4-flash).
Add codex runner (--runner codex): wraps codex exec --json (runner/codex.go), parses the JSONL event stream, and estimates cost from published token rates since codex reports none; --model defaults to gpt-5.1-codex.
Extract all runners into pkg/reviewer/runner (claude/opencode moved, codex/direct new), all behind the ReviewRunner interface with compile-time assertions. Controller now depends on runner.ReviewRunner.
Live event logging for every runner: claude switched to --output-format stream-json --verbose, and claude/opencode/codex/direct now surface tool calls live to the reviewctl log while still writing the full transcript to *-output.json(l).
Cross-runner --effort knob (low..max): Anthropic + codex (-c model_reasoning_effort) honor it; OpenAI-compatible providers ignore it. Direct-runner API key resolves per provider (REVIEW_API_KEY → ANTHROPIC_API_KEY/DEEPSEEK_API_KEY/OPENAI_API_KEY), see directKeyEnvs.
Reliability fixes surfaced by self-review: symlink sandbox escape and compaction role collision (fe1e643), bare-assistant-turn 400 from V4 (0c82d0c), recover around panicking tools in parallel dispatch (451f049), large-diff preload overflow with git_diff(path=...) hint (b0ae204), streaming to avoid HTTP read timeouts (ab06be4).
Frontend: render mermaid per-diagram and fall back to a small notice + raw source on invalid (often LLM-generated) syntax, instead of mermaid's full-page error graphic (MarkdownContent.vue).
Housekeeping: README documents the four runners; anthropic-sdk-go + go-openai promoted to direct deps; local-only DirectAPIRunner planning docs git-ignored.

Test plan

make fmt lint test — green
reviewctl review --runner direct --api-provider anthropic --model claude-opus-4-8 --effort xhigh — submits a filled review.json, live direct tool events in the log, transcript in direct-output.jsonl
reviewctl review --runner direct --api-provider deepseek --model deepseek-v4-pro — completes without the bare-assistant 400; cost/tokens recorded
reviewctl review --runner codex --model gpt-5.1-codex — parses JSONL, estimates cost, posts issues
reviewctl review --runner claude — stream-json parsed (last result line), live claude tool events, normal upload/comment flow intact
reviewctl review --runner opencode -m openrouter/deepseek/deepseek-v4-flash — live opencode tool events, review submitted
UI: a review with invalid mermaid shows the inline ⚠ fallback, not the full-page error

- Add pkg/reviewer/direct: in-process agent loop that calls the LLM API directly (DeepSeek/OpenAI-compatible) with a narrow review tool set — read_file, read_files, glob, grep, git_diff and ast_* — instead of shelling out to the claude/opencode CLIs - Stream the review output via set_group/add_issues/submit_review so each call stays small (a monolithic payload overflows DeepSeek's output cap and arrives as truncated JSON); tolerate stringified args - Wire DirectRunner (--runner direct) that fills the existing ClaudeResult/ReviewModelInfo; measure tokens and cost per round - Pre-load the diff and changed-file contents into the kickoff and dedup repeated reads to cut round-trips - Rebuild the ast-index automatically before review; ast_* tools degrade to grep when the binary is absent - Stream the session transcript to direct-output.jsonl and add it to the debug bundle - Fix MarkdownContent.vue to render mermaid per block with a fallback on syntax errors instead of the full-page error graphic - Add github.com/sashabaranov/go-openai dependency and vendor it

- Add anthropic-sdk-go provider (M1) with prompt caching, adaptive thinking and effort, replacing the not-implemented stub - Cache the whole message history via a rolling cache breakpoint so the preload and tool results are not re-billed every round (fresh input dropped from 222k to ~40 tokens on an Opus run) - Preserve signed thinking blocks across rounds by replaying the verbatim assistant turn (resp.ToParam) instead of rebuilding it from text and tool calls, avoiding lost reasoning continuity - Fix compaction emitting two consecutive user messages, which the Anthropic API rejects, by folding the marker into the head message - Default direct+anthropic to claude-opus-4-8 and effort xhigh, and thread --effort through to the claude CLI runner - Exclude vendor and generated trees from git_diff and the changed file preload so they do not swamp the review - Wipe a previous run's outputs (R*.md, review.json, session logs) before each review so the runner starts on a clean tree - Add tests for the above and vendor anthropic-sdk-go with its deps

- Tell the model to emit independent set_group/add_issues calls in the same step instead of one per step, splitting only to avoid output truncation (keeps the DeepSeek 8k-cap safety valve) - Collapses the output phase from ~7 rounds to ~3, so the growing conversation is re-billed (cache read) fewer times per run

- Price deepseek-v4-pro and deepseek-v4-flash; legacy deepseek-chat/ reasoner now price as V4 Flash (they alias it, deprecating 2026-07-24) - Fix openai provider sending a bare assistant message (no content, no tool calls) which DeepSeek rejects with HTTP 400; skip it, as the anthropic provider already does. deepseek-v4-pro emits such turns

Both issues surfaced by dogfooding the review runner (Opus and DeepSeek-V4-pro flagged the symlink escape independently). - resolveInRoot now resolves symlinks and re-checks the target stays in root; a symlink inside the tree pointing outside (e.g. evil.txt -> /etc/passwd) previously passed the lexical check and leaked host file contents into the review output - compactMessages now resumes the tail on an assistant turn, so a mid-history user message (e.g. a submit nudge) landing on the cut boundary no longer collides with the kept user head into two consecutive user messages, which the Anthropic API rejects with 400 - Add regression tests for both

- Emit "system" and "user" events at run start so direct-output.jsonl is a full input/output log; it previously recorded only model output and tool I/O, omitting the system contract and the preloaded task

- Switch the anthropic provider from Messages.New to NewStreaming + Accumulate; a heavy high/xhigh round (adaptive thinking + long output) could exceed the non-streaming HTTP read timeout - Usage, stop reason, content and ToParam replay are unchanged — the accumulated message is equivalent to the non-streamed one

- List the names of changed files that overflow the preload budget instead of just a count, so the model can read_files them directly without a discovery round - Add a path argument to git_diff so the model can pull the full diff of a single file, bypassing the 300k whole-diff clip — restoring the diff view for files the preload could not inline - Add validPath/pathspec tests

- Add ExecCodexRunner: shells out to `codex exec --json` in the workspace-write sandbox, aggregates the JSONL stream (thread/turn/ item events) into ClaudeResult and estimates cost from tokens since codex reports none; modeled on a codex-exec agent - Wire --runner codex and the RunnerCodex identifier - Stream significant events to the runner log live: codex tool commands and per-turn usage, plus the direct runner's tool calls and per-round usage, via an optional per-line stdout tee in runExec - Add codex parser/argv/cost tests

- Move ExecClaudeRunner/ExecOpenCodeRunner/ExecCodexRunner/DirectRunner, the shared ClaudeResult types and the ReviewRunner interface out of the overloaded ctl package into pkg/reviewer/runner, mirroring a flat agent layout - ctl (Controller/Config/recovery) and main now import the runner package - Pure move, no logic change

Fix valid review-54 finding A1: an arbitrary OpenAI-compatible endpoint had to put its key in DEEPSEEK_API_KEY because directKeyEnv only split anthropic vs everything else. - Add a provider-agnostic REVIEW_API_KEY override; openai-compat now falls back to OPENAI_API_KEY and deepseek to DEEPSEEK_API_KEY - Match the provider name case-insensitively (aligns with NewProvider, also closing the Anthropic-vs-anthropic key mismatch) - Add cmd/reviewctl tests

- Stream significant opencode events to the runner log as they arrive (tool calls with a short detail, per-step usage at debug), matching the codex and direct runners, via the runExec per-line stdout tee

The research/spec docs are kept on disk for reference but removed from history and ignored so they are not committed.

- dispatchParallel now recovers a panicking tool handler into an error result instead of crashing the review process (flagged by the opencode review); the sibling tools in the turn still complete - Pin gpt-5.1-codex as the codex default model so its token-based cost is estimated instead of reported as 0; override with --model

- go mod tidy: anthropic-sdk-go and go-openai are imported directly, no longer flagged // indirect - The direct loop now sums provider Complete time into DurationAPIMs instead of reporting total wall-clock (which includes local tool time)

- Switch the claude runner to --output-format stream-json --verbose so its NDJSON events can be surfaced live: tool calls are logged as they arrive, matching the codex/opencode/direct runners - ParseClaudeResult now decodes a stream of newline-delimited objects (keeping the last "result"), in addition to the single-object and JSON-array forms; the result line carries the same fields - Validated against a real `claude --output-format stream-json` sample

- List all four runners (claude/opencode/codex/direct) and the direct runner's --api-provider/--effort flags and REVIEW_API_KEY env var

- Reorder test imports to localmodule-before-default per the gci config (config_test.go, ctl_test.go) - Pre-allocate the preloaded slice in the direct preload (prealloc) - Reuse directSubtypeSuccess/directSubtypeError for runner result subtypes instead of "success"/"error" literals (goconst) - Add codexExecCmd, gitNoPager and gitDiffCmd consts to drop duplicated command-argument string literals (goconst)

sergeyfast added 18 commits June 12, 2026 23:01

Log the kickoff input in the direct session transcript

15690c1

- Emit "system" and "user" events at run start so direct-output.jsonl is a full input/output log; it previously recorded only model output and tool I/O, omitting the system contract and the preloaded task

Log opencode tool calls live

a72ba1f

- Stream significant opencode events to the runner log as they arrive (tool calls with a short detail, per-step usage at debug), matching the codex and direct runners, via the runExec per-line stdout tee

Ignore local-only DirectAPIRunner planning docs

50f10df

The research/spec docs are kept on disk for reference but removed from history and ignored so they are not committed.

Mark direct deps direct and track real API time

fc35284

- go mod tidy: anthropic-sdk-go and go-openai are imported directly, no longer flagged // indirect - The direct loop now sums provider Complete time into DurationAPIMs instead of reporting total wall-clock (which includes local tool time)

Document codex and direct runners in the README

68394fc

- List all four runners (claude/opencode/codex/direct) and the direct runner's --api-provider/--effort flags and REVIEW_API_KEY env var

sergeyfast merged commit 9f91653 into master Jun 13, 2026
3 checks passed

sergeyfast deleted the feat/direct-api-runner branch June 13, 2026 19:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add direct-API and codex review runners with live logging#19

Add direct-API and codex review runners with live logging#19
sergeyfast merged 18 commits into
masterfrom
feat/direct-api-runner

sergeyfast commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sergeyfast commented Jun 13, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant