Add direct-API and codex review runners with live logging#19
Merged
Conversation
- Add pkg/reviewer/direct: in-process agent loop that calls the LLM API directly (DeepSeek/OpenAI-compatible) with a narrow review tool set — read_file, read_files, glob, grep, git_diff and ast_* — instead of shelling out to the claude/opencode CLIs - Stream the review output via set_group/add_issues/submit_review so each call stays small (a monolithic payload overflows DeepSeek's output cap and arrives as truncated JSON); tolerate stringified args - Wire DirectRunner (--runner direct) that fills the existing ClaudeResult/ReviewModelInfo; measure tokens and cost per round - Pre-load the diff and changed-file contents into the kickoff and dedup repeated reads to cut round-trips - Rebuild the ast-index automatically before review; ast_* tools degrade to grep when the binary is absent - Stream the session transcript to direct-output.jsonl and add it to the debug bundle - Fix MarkdownContent.vue to render mermaid per block with a fallback on syntax errors instead of the full-page error graphic - Add github.com/sashabaranov/go-openai dependency and vendor it
- Add anthropic-sdk-go provider (M1) with prompt caching, adaptive thinking and effort, replacing the not-implemented stub - Cache the whole message history via a rolling cache breakpoint so the preload and tool results are not re-billed every round (fresh input dropped from 222k to ~40 tokens on an Opus run) - Preserve signed thinking blocks across rounds by replaying the verbatim assistant turn (resp.ToParam) instead of rebuilding it from text and tool calls, avoiding lost reasoning continuity - Fix compaction emitting two consecutive user messages, which the Anthropic API rejects, by folding the marker into the head message - Default direct+anthropic to claude-opus-4-8 and effort xhigh, and thread --effort through to the claude CLI runner - Exclude vendor and generated trees from git_diff and the changed file preload so they do not swamp the review - Wipe a previous run's outputs (R*.md, review.json, session logs) before each review so the runner starts on a clean tree - Add tests for the above and vendor anthropic-sdk-go with its deps
- Tell the model to emit independent set_group/add_issues calls in the same step instead of one per step, splitting only to avoid output truncation (keeps the DeepSeek 8k-cap safety valve) - Collapses the output phase from ~7 rounds to ~3, so the growing conversation is re-billed (cache read) fewer times per run
- Price deepseek-v4-pro and deepseek-v4-flash; legacy deepseek-chat/ reasoner now price as V4 Flash (they alias it, deprecating 2026-07-24) - Fix openai provider sending a bare assistant message (no content, no tool calls) which DeepSeek rejects with HTTP 400; skip it, as the anthropic provider already does. deepseek-v4-pro emits such turns
Both issues surfaced by dogfooding the review runner (Opus and DeepSeek-V4-pro flagged the symlink escape independently). - resolveInRoot now resolves symlinks and re-checks the target stays in root; a symlink inside the tree pointing outside (e.g. evil.txt -> /etc/passwd) previously passed the lexical check and leaked host file contents into the review output - compactMessages now resumes the tail on an assistant turn, so a mid-history user message (e.g. a submit nudge) landing on the cut boundary no longer collides with the kept user head into two consecutive user messages, which the Anthropic API rejects with 400 - Add regression tests for both
- Emit "system" and "user" events at run start so direct-output.jsonl is a full input/output log; it previously recorded only model output and tool I/O, omitting the system contract and the preloaded task
- Switch the anthropic provider from Messages.New to NewStreaming + Accumulate; a heavy high/xhigh round (adaptive thinking + long output) could exceed the non-streaming HTTP read timeout - Usage, stop reason, content and ToParam replay are unchanged — the accumulated message is equivalent to the non-streamed one
- List the names of changed files that overflow the preload budget instead of just a count, so the model can read_files them directly without a discovery round - Add a path argument to git_diff so the model can pull the full diff of a single file, bypassing the 300k whole-diff clip — restoring the diff view for files the preload could not inline - Add validPath/pathspec tests
- Add ExecCodexRunner: shells out to `codex exec --json` in the workspace-write sandbox, aggregates the JSONL stream (thread/turn/ item events) into ClaudeResult and estimates cost from tokens since codex reports none; modeled on a codex-exec agent - Wire --runner codex and the RunnerCodex identifier - Stream significant events to the runner log live: codex tool commands and per-turn usage, plus the direct runner's tool calls and per-round usage, via an optional per-line stdout tee in runExec - Add codex parser/argv/cost tests
- Move ExecClaudeRunner/ExecOpenCodeRunner/ExecCodexRunner/DirectRunner, the shared ClaudeResult types and the ReviewRunner interface out of the overloaded ctl package into pkg/reviewer/runner, mirroring a flat agent layout - ctl (Controller/Config/recovery) and main now import the runner package - Pure move, no logic change
Fix valid review-54 finding A1: an arbitrary OpenAI-compatible endpoint had to put its key in DEEPSEEK_API_KEY because directKeyEnv only split anthropic vs everything else. - Add a provider-agnostic REVIEW_API_KEY override; openai-compat now falls back to OPENAI_API_KEY and deepseek to DEEPSEEK_API_KEY - Match the provider name case-insensitively (aligns with NewProvider, also closing the Anthropic-vs-anthropic key mismatch) - Add cmd/reviewctl tests
- Stream significant opencode events to the runner log as they arrive (tool calls with a short detail, per-step usage at debug), matching the codex and direct runners, via the runExec per-line stdout tee
The research/spec docs are kept on disk for reference but removed from history and ignored so they are not committed.
- dispatchParallel now recovers a panicking tool handler into an error result instead of crashing the review process (flagged by the opencode review); the sibling tools in the turn still complete - Pin gpt-5.1-codex as the codex default model so its token-based cost is estimated instead of reported as 0; override with --model
- go mod tidy: anthropic-sdk-go and go-openai are imported directly, no longer flagged // indirect - The direct loop now sums provider Complete time into DurationAPIMs instead of reporting total wall-clock (which includes local tool time)
- Switch the claude runner to --output-format stream-json --verbose so its NDJSON events can be surfaced live: tool calls are logged as they arrive, matching the codex/opencode/direct runners - ParseClaudeResult now decodes a stream of newline-delimited objects (keeping the last "result"), in addition to the single-object and JSON-array forms; the result line carries the same fields - Validated against a real `claude --output-format stream-json` sample
- List all four runners (claude/opencode/codex/direct) and the direct runner's --api-provider/--effort flags and REVIEW_API_KEY env var
- Reorder test imports to localmodule-before-default per the gci config (config_test.go, ctl_test.go) - Pre-allocate the preloaded slice in the direct preload (prealloc) - Reuse directSubtypeSuccess/directSubtypeError for runner result subtypes instead of "success"/"error" literals (goconst) - Add codexExecCmd, gitNoPager and gitDiffCmd consts to drop duplicated command-argument string literals (goconst)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--runner direct): drives an LLM through the Messages/Chat API directly (no CLI) with a narrow review tool set (read/grep/glob/git_diff/ast_*) plus asubmit_reviewtool. Newpkg/reviewer/directpackage — agent loop with parallel tool dispatch (loop.go), diff + changed-file preload (preload.go), per-round usage/cost accounting, mid-run compaction, and asubmit_reviewthat always produces a filledreview.json(no Step-2 retry needed).LLMProviderinterface, selected by--api-provider: native Anthropic (provider_anthropic.go, streaming viaMessages.NewStreaming+Accumulate, rolling prompt-cache breakpoint, adaptive thinking preserved throughRaw) and OpenAI-compatible (provider_openai.go, coversdeepseekandopenai-compat). Pricing/effort resolved inprovider_factory.go, incl. DeepSeek V4 (deepseek-v4-pro/deepseek-v4-flash).--runner codex): wrapscodex exec --json(runner/codex.go), parses the JSONL event stream, and estimates cost from published token rates since codex reports none;--modeldefaults togpt-5.1-codex.pkg/reviewer/runner(claude/opencode moved, codex/direct new), all behind theReviewRunnerinterface with compile-time assertions. Controller now depends onrunner.ReviewRunner.claudeswitched to--output-format stream-json --verbose, and claude/opencode/codex/direct now surface tool calls live to the reviewctl log while still writing the full transcript to*-output.json(l).--effortknob (low..max): Anthropic + codex (-c model_reasoning_effort) honor it; OpenAI-compatible providers ignore it. Direct-runner API key resolves per provider (REVIEW_API_KEY→ANTHROPIC_API_KEY/DEEPSEEK_API_KEY/OPENAI_API_KEY), seedirectKeyEnvs.fe1e643), bare-assistant-turn 400 from V4 (0c82d0c),recoveraround panicking tools in parallel dispatch (451f049), large-diff preload overflow withgit_diff(path=...)hint (b0ae204), streaming to avoid HTTP read timeouts (ab06be4).MarkdownContent.vue).anthropic-sdk-go+go-openaipromoted to direct deps; local-only DirectAPIRunner planning docs git-ignored.Test plan
make fmt lint test— greenreviewctl review --runner direct --api-provider anthropic --model claude-opus-4-8 --effort xhigh— submits a filledreview.json, livedirect toolevents in the log, transcript indirect-output.jsonlreviewctl review --runner direct --api-provider deepseek --model deepseek-v4-pro— completes without the bare-assistant 400; cost/tokens recordedreviewctl review --runner codex --model gpt-5.1-codex— parses JSONL, estimates cost, posts issuesreviewctl review --runner claude— stream-json parsed (lastresultline), liveclaude toolevents, normal upload/comment flow intactreviewctl review --runner opencode -m openrouter/deepseek/deepseek-v4-flash— liveopencode toolevents, review submitted