Skip to content

Agent-Review Core: config + index + learning layers#1858

Open
dr-bizz wants to merge 40 commits into
mainfrom
review-config-layer
Open

Agent-Review Core: config + index + learning layers#1858
dr-bizz wants to merge 40 commits into
mainfrom
review-config-layer

Conversation

@dr-bizz

@dr-bizz dr-bizz commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Agent-Review Core: config + index + learning layers

Upgrades the agent-review reviewer toward Greptile/CodeRabbit-class capabilities by extracting a clean, file-based, UI-ready review core under .claude/review/. Claude Code remains the orchestrator (no separate API billing); the debate/consensus stages are unchanged.

What's included (3 phases, TDD, 51 passing tests)

Phase A — Declarative config layer

  • config.yml + config.schema.json (ajv-validated) + rules/*.md
  • Risk scoring, the 7 agents + glob/content triggers, per-path rules, profile dial, exclusions — all migrated from the prose code-review.md into declarative config
  • Engine modules: loadConfig, scoreRisk, selectAgents, resolveRules, detectSpecial, plan.cjs

Phase B — Persistent codebase index (impact analysis)

  • File-level import graph for cross-file impact analysis ("this change affects N callers")
  • resolveImport, buildGraph, queryImpact, indexStore (gitignored, HEAD-keyed cache), impact.cjs
  • Replaces the command's grep-based Stage 1B; feeds dependents to the Architecture + Data Integrity agents

Phase C — Feedback-learning loop (approval-gated)

  • Capture per-finding accepted/dismissed outcomes → mine recurring patterns → human approves → suppress noise + inject rules
  • findingSignature, mineLearnings, applyLearnings, learningsStore + review:feedback/review:learn scripts
  • Shared knowledge committed (feedback.jsonl, learnings.yml); transients gitignored

Implementation notes

  • CommonJS engine (Yarn 4 PnP stable path); tested via a single-process node:test runner (yarn test:review) — 51 tests pass
  • New devDeps: yaml, minimatch, ajv (tooling only)
  • Design specs + plans under docs/superpowers/
  • lint:ts failures in CI are pre-existing app codegen gaps (the .claude/review engine is outside tsconfig scope)

🤖 Built with Claude Code (brainstorm → spec → plan → subagent-driven implementation).

dr-bizz added 29 commits June 23, 2026 12:08
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Bundle sizes [mpdx-react]

Compared against 63a8a1c

Route Size (gzipped) Diff
/accountLists/[accountListId]/hrTools/partnerReminders 322.84 KB -1.07 KB
/accountLists/[accountListId]/hrTools/staffSavingFund/transfers 327.7 KB -1.73 KB
/accountLists/[accountListId]/reports/mpgaIncomeExpenses 392.08 KB -1.38 KB
/accountLists/[accountListId]/reports/staffExpense 358.81 KB -1.46 KB
/accountLists/[accountListId]/hrTools/mpdSupervisorReport no change removed
Dynamic import Size (gzipped) Diff
src/components/HrTools/MpdSupervisorReport/StaffDetailsTabs/MPGA/DynamicMPGA.tsx -> ./MPGA no change removed
src/components/HrTools/MpdSupervisorReport/StaffDetailsTabs/MonthlySummary/DynamicMonthlySummary.tsx -> ./MonthlySummary no change removed
src/components/HrTools/MpdSupervisorReport/StaffDetailsTabs/Payroll/DynamicPayroll.tsx -> ./Payroll no change removed
src/components/HrTools/MpdSupervisorReport/StaffDetailsTabs/Quarterly/DynamicQuarterly.tsx -> ./Quarterly no change removed
src/components/HrTools/MpdSupervisorReport/StaffDetailsTabs/StaffExpenseReport/DynamicStaffExpenseReport.tsx -> ./StaffExpenseReport no change removed

@dr-bizz

dr-bizz commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

🤖 AI multi-agent dogfood review — the reviewer reviewing its own PR. Engine pre-flight selected the panel; 4 substantive specialists (security, architecture, testing, standards) ran on the engine/config/command code. 22 findings (severities modest, mostly 4–6). Advisory — and a deliberate self-test.


PR #1858 Dogfood Review: Agent-Review Engine (Reviewer Reviewing Itself)

Executive summary

The agent-review engine is a well-built, unusually well-tested piece of multi-repo review tooling: cleanly layered pure-core modules, 58 passing node:test cases, consistent CommonJS, escaped regexes, and shell-free execFileSync subprocess calls (no command injection, no ReDoS, no secret leakage found). The substantive issues are not in the pure scoring core but at the seams: two parallel entry points and three arg parsers drift, the refactored command gates engine features by grepping raw YAML instead of the parsed config, and yarn review run hardcodes a scope that inflates its preflight risk relative to the real review path. Most security findings are lower-severity hardening items; the one structural trust-boundary concern is the auto-fix design (LLM-emitted shell run via apply_all.sh).

BLOCKING

1. Review command runs model-emitted bash (sed auto-fixes) via apply_all.sh (sev 5, security)
.claude/commands/agent-review.md:403-412, 1284-1297 — Agents write sed -i scripts derived from attacker-influenceable diff/file content to /tmp/automated_fixes/*.sh, then apply_all.sh auto-loops bash "$fix" over them. Untrusted diff -> generated shell -> executed shell. Fix: gate each fix behind explicit human confirmation, constrain edits to the changed-file set, prefer git apply of structured patches over free-form sed, never auto-run; document the scripts as untrusted.

2. Agent over-selection from trigger keywords inside the reviewer's own rule docs (sev 6, dogfood — see meta section)
data-integrity, ux, financial agents auto-selected because content:mutation / content:<Formik / content:amount triggers appear in .claude/review/rules/*.md prose, not code. Fix: exclude .claude/review/rules/** and doc markdown from content-trigger scanning, or scan only added code lines.

3. yarn review run hardcodes scope=multi_feature, diverging from the real risk score (sev 6, architecture)
.claude/review/cli.cjs:108buildPlan is called with scope:'multi_feature' (1.3x multiplier) while agent-review.md Stage 0 and plan.cjs:35 default to single_feature (1.0x). The same diff can land in a different LOW/MEDIUM/HIGH/CRITICAL band depending on entry point, contradicting "engine is the single source of truth" (agent-review.md:172). Fix: default run to single_feature, or accept --scope and thread it through; centralize the default in one constant.

4. Two parallel CLI dispatchers with duplicated logic and divergent yarn wiring (sev 5, architecture)
.claude/review/cli.cjs:24-37, 69-95 — Flows exist both in cli.cjs and in require.main blocks of learningsStore.cjs/indexStore.cjs/impact.cjs/plan.cjs; package.json wires some scripts to cli.cjs and others to the per-module CLIs (two arg conventions). emit/filter learning steps are reachable only via the module CLI, never the run orchestrator. Fix: make cli.cjs the only dispatcher, reduce module require.main blocks to thin shims, point all yarn scripts at cli.cjs, and extract one shared changedFiles/parseArgs.

5. Command gates engine features by grepping raw config.yml instead of parsed config (sev 5, architecture)
.claude/commands/agent-review.md:292, 1194grep -q "enabled: true" is unscoped (matches both index: and learning: blocks), and grep -q "learning:" matches key presence not learning.enabled. The engine already exposes parsed cfg.index.enabled/cfg.learning.enabled (cli.cjs:110). A formatting change silently breaks gating. Fix: surface gate decisions from the engine (e.g. yarn review config get index.enabled, or indexEnabled/learningEnabled booleans in plan JSON) and branch on those.

6. plan.cjs parseArgs is buggy on value-less flags and untested (sev 4, testing)
.claude/review/engine/plan.cjs:17-21 — Fixed 2-stride loop mis-pairs every arg after any boolean flag (parseArgs(['--verbose','--config','c.yml']) yields {verbose:'--config'}, drops the config value). impact.cjs has a second, correct, tested parseArgs; plan.test.cjs never tests its own. Latent regression: any future --no-impact-style flag corrupts parsing silently. Fix: consolidate on the impact.cjs parser (or a shared helper) and add a plan test asserting a boolean flag doesn't consume the next token.

7. User-controlled --base ref interpolated into git args without a leading-dash guard (sev 4, security)
.claude/review/cli.cjs:30, 102-103--base value flows into git diff <base>...HEAD via execFileSync (argv array, so no injection today), but is unvalidated; any future refactor passing b as its own arg exposes argument injection (e.g. --upload-pack). Fix: validate against /^[A-Za-z0-9._\/~^-]+$/, reject leading -, and prefer the explicit git diff <a> <b> / -- form.

8. flag() consumes the following flag token as a value (sev 4, standards; related to #6/#7)
.claude/review/cli.cjs:22flag() returns argv[i+1] unconditionally, so yarn review run --base --no-launch sets base='--no-launch', and a trailing --base returns undefined silently. Fix: return undefined when the next token is missing or starts with --, and surface a usage error. (Same fragility class as plan.cjs/impact.cjs parsers, #6.)

9. Index path is hardcoded, ignoring config.index.path (sev 4, standards)
.claude/review/cli.cjs:17,36,110config.yml defines index.path and the schema models it, but cli.cjs hardcodes join(RD,'index'); changing the config path silently builds/reads the index at the wrong location. The config field is dead. Fix: derive the path from cfg.index.path (with default fallback) in loadIndex()/run, or drop the field from config+schema.

IMPORTANT

10. run re-implements linesChanged parsing instead of the exported helper (sev 5, standards; DRY)
.claude/review/cli.cjs:104-106 — Inline insertions/deletions regexes duplicate plan.cjs's already-exported linesChangedFromStat(). Fix: import and call linesChangedFromStat(stat) (already importing buildPlan from the same module).

11. Numeric CLI flags accept NaN with no validation, fail silently (sev 5, standards)
.claude/review/cli.cjs:77 (--min-support), impact.cjs:30-31 (--max-depth/--max-nodes) — Number('abc')===NaN flows into mineLearnings where ?? 3 doesn't catch NaN, so every total < NaN is false and zero proposals mine with no error. Fix: guarded parse — reject non-positive-integer with a usage error and exit 1.

12. loadConfig's throw-on-invalid path is never tested (sev 3, testing)
.claude/review/engine/loadConfig.test.cjs:31-36 — Only the pure validateConfig() is asserted; loadConfig()'s disk-read, YAML-parse-then-validate, and throw new Error('Invalid review config...') formatting are uncovered. Fix: write a temp invalid config to a tmpdir and assert.throws(() => loadConfig(...), /Invalid review config/); add a malformed-YAML case.

13. CLI command dispatch largely untested (sev 3, testing)
.claude/review/cli.cjs:50-134 — Only help/empty/unknown branches covered; config/index/impact/feedback/learn/learnings/approve/reject/run arg-handling and exit codes (e.g. feedback/approve/reject with no arg returning 1; --no-launch) are not. Fix: drive main() for the no-arg exit-1 branches (pure, no git needed); for repo-dependent commands point ROOT at a tmp git repo or make git/path constants injectable.

14. learningsStore CLI and file-I/O helpers untested (sev 3, testing)
.claude/review/engine/learningsStore.cjs:57-93loadLearnings/saveLearnings, loadFeedback/appendFeedback, and the whole --emit/--ingest/--mine/--rules/--filter CLI block are uncovered; the full emit->ingest->mine->approve->filter loop is never exercised end-to-end. Fix: tmpdir round-trip tests (save/load, append/load) plus one emit->ingest->mine integration test asserting a proposal is produced.

15. Config/learnings rule-doc paths read and injected into prompts without a path-traversal guard (sev 3, security)
.claude/commands/agent-review.md:275-300 — Rule paths from config.yml/learnings are typed as plain strings (no schema pattern); a ../../.env-style entry would be read and injected into a prompt. Trust boundary is human-ratified config, but there's no defense-in-depth. Fix: constrain rule paths in config.schema.json to ^rules/[A-Za-z0-9._-]+\.md$ and verify each resolves within .claude/review/rules/ before reading.

16. Bad/shallow git state crashes commands with a raw error (sev 3, architecture)
.claude/review/cli.cjs:24-33, 102-103 — When merge-base and HEAD~1 both fail (shallow clone, single commit, detached HEAD), the git diff throws and the top-level catch prints raw git stderr — inconsistent with the friendly loadConfig/usage messaging. Fix: wrap the diff/merge-base calls and emit a directed message ("could not determine a diff base; pass --base <ref>") with non-zero exit.

MINOR

17. run mode inferred by exclusion, not validated (sev 3, standards/security — flagged by standards+security)
.claude/review/cli.cjs:100 (and 99-119) — Any non-flag token becomes mode and is forwarded verbatim into claude -p "/agent-review ${mode}"; a typo silently launches an unknown mode and the preflight summary disagrees with the launched intent. Fix: validate against ['quick','standard','deep'] and warn/reject otherwise.

18. Lazy require() of fs/os mid-run-case breaks the require convention (sev 3, standards)
.claude/review/cli.cjs:97-98require('node:fs')/require('node:os') mid-function; no lazy-load benefit in a tiny CLI. Fix: add writeFileSync to the top-level node:fs destructure and hoist node:os.

19. JSON.parse on per-line feedback / arbitrary --in files without try/catch (sev 2, security)
.claude/review/engine/learningsStore.cjs:48, 66, 88 — One malformed feedback.jsonl line throws an unhandled SyntaxError and halts the learning pipeline. Fix: wrap per-line parse in try/catch to skip+count malformed lines; validate parsed shape. (Robustness overlap with testing finding #14.)

20. Unvalidated mode flows into the launched claude prompt (sev 2, security) — same root cause as #17; covered there.

21. Empty-diff / zero-files edge uncovered for scoreRisk/selectAgents (sev 2, testing)
.claude/review/engine/scoreRisk.test.cjs:33-63 — Behavior is correct today (score 0/LOW; only always-on agents) but unpinned. Fix: one assertion each pinning the no-op case.

22. detectSpecial negative/boundary cases untested (sev 2, testing)
.claude/review/engine/detectSpecial.test.cjs:10-32 — Only positive detections asserted; missing deletion-only package.json (must NOT flag new_dependency), lockfile+pkg together (must NOT flag lockfile_only_change), next.config without security keywords, and an adversarial-package-name escapeRe assertion. Fix: add negative-assertion cases per guard plus one escapeRe test with regex metachars.

23. Language/alias support hardcoded across resolveImport and indexStore (sev 2, architecture)
.claude/review/engine/resolveImport.cjs:4-5 (+ indexStore.cjs:7 INDEX_RE) — ALIASES/EXTS and INDEX_RE must be hand-kept consistent; adding a tsconfig @/ mapping or new extension means editing engine source, breaking the otherwise config-driven model. Fix: promote alias roots/extensions into config.yml's index: block (or read tsconfig paths) and have both modules consume them.

Dogfood / self-review meta-findings

Running the reviewer's own engine on this PR surfaced two configuration/scoping bugs that the engine's design invites when the thing under review is the engine itself (or any repo where rule/doc prose overlaps with code-trigger vocabulary):

  • Risk scored 275/CRITICAL — inflated (sev ~5). The config marks ALL of .claude/** as critical (+3 each), so each of the ~37 engine files counts as critical and is then amplified by the cross-cutting multiplier. The real risk of a self-contained tooling PR is far lower; the band is meaningless here. Fix: scope the .claude/** critical pattern to harness-integrity files only (commands/, rules/, settings), not the engine source.
  • Agent over-selection from rule-doc prose (sev ~6). data-integrity (content:mutation), ux (content:<Formik), and financial (content:amount) were auto-selected because those trigger keywords appear inside the reviewer's own rules/*.md documentation text, not in any code. Fix: exclude .claude/review/rules/** and doc markdown from content-trigger scanning, or scan only added code lines. (This is the same root cause as BLOCKING contacts #2.)

Both meta-findings share a theme the specialists corroborated: the engine's file-set and trigger definitions are not context-aware (they treat doc markdown like source) and its critical-path config is coarse (.claude/** wholesale) — consistent with the architecture finding (#23) that the file-set/trigger model is hardcoded and not config-discriminating, and with the architecture concern that scope/scoring inputs drift across entry points (#3).

What's solid

  • Pure core is clean and correct. No findings in scoreRisk/selectAgents/resolveRules/detectSpecial/queryImpact — well-bounded, plain-args-in/data-out, IO isolated into loadConfig/indexStore/learningsStore.
  • Security posture is strong. All subprocess calls use execFileSync with argv arrays (no shell, no command injection); package-name regex is properly escaped; import/diff regexes are linear (no ReDoS); no secret leakage in logs. Index cache is gitignored and HEAD-keyed; learnings.yml is committed (correct human-ratified trust model).
  • Unusually well-tested for review tooling. 58 node:test cases, one .test.cjs per module, real assertions with good edge coverage in the highest-value modules (scoreRisk exclusions/rounding, queryImpact truncation, indexStore cache reuse + head-mismatch rebuild, findingSignature normalization, mineLearnings support thresholds, learningsStore status-preservation). Verified the PnP single-process runner correctly propagates non-zero exit codes on Node 24, so yarn test:review fails CI as intended.
  • Clean standards + layering. Consistent CommonJS ('use strict', node: prefixes, named exports), clean naming, no console/debugger/TODO noise, informative errors, deps correctly declared. buildPlan is a tidy composition seam; the agent-review.md refactor genuinely consumes the engine instead of duplicating risk math inline.

dr-bizz added 4 commits June 25, 2026 14:07
- content triggers skip prose (.md), the reviewer's own config/rule definition files,
  and package-manager artifacts (.yarn/.pnp/lockfiles) — fixes agent over-selection
- scope .claude/** critical pattern to harness-integrity files; engine source is ordinary
  code (risk de-inflated 275->82 on this PR)
- plan/impact parseArgs no longer swallow the token after a boolean flag; flag() rejects
  flag-as-value; NaN guards on --min-support/--max-depth/--max-nodes
- review run: scope default single_feature (+ --scope), index.path from config,
  linesChangedFromStat reused, mode validated, --base ref validated, friendly git errors
- config get <dot.path> subcommand; loadFeedback skips malformed JSONL
- config.schema.json constrains rule paths; + regression tests (65 pass)
- #23 config-driven import graph: resolveImport/buildGraph/indexStore take aliases/
  extensions/roots from config.index (defaults preserved); schema + tests
- #4 single arg parser (engine/args.cjs) used by plan/impact/indexStore/learningsStore
  (no more 4 divergent parsers); review:* scripts repointed to the cli.cjs dispatcher;
  cli index --force
- #5 command gates index/learning via 'review config get', not raw grep of config.yml
- #1 auto-fix apply_all.sh now DRY-RUNS untrusted model-generated scripts; applies only
  with --yes
- 68 tests pass
Resolve code-review.md conflict; prettier-format the review engine files.

Claude-Session: https://claude.ai/code/session_01Q2jTiajZG4YmVb6ovhmn9V
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant