Agent-Review Core: config + index + learning layers by dr-bizz · Pull Request #1858 · CruGlobal/mpdx-react

dr-bizz · 2026-06-23T19:01:52Z

Agent-Review Core: config + index + learning layers

Upgrades the agent-review reviewer toward Greptile/CodeRabbit-class capabilities by extracting a clean, file-based, UI-ready review core under .claude/review/. Claude Code remains the orchestrator (no separate API billing); the debate/consensus stages are unchanged.

What's included (3 phases, TDD, 51 passing tests)

Phase A — Declarative config layer

config.yml + config.schema.json (ajv-validated) + rules/*.md
Risk scoring, the 7 agents + glob/content triggers, per-path rules, profile dial, exclusions — all migrated from the prose code-review.md into declarative config
Engine modules: loadConfig, scoreRisk, selectAgents, resolveRules, detectSpecial, plan.cjs

Phase B — Persistent codebase index (impact analysis)

File-level import graph for cross-file impact analysis ("this change affects N callers")
resolveImport, buildGraph, queryImpact, indexStore (gitignored, HEAD-keyed cache), impact.cjs
Replaces the command's grep-based Stage 1B; feeds dependents to the Architecture + Data Integrity agents

Phase C — Feedback-learning loop (approval-gated)

Capture per-finding accepted/dismissed outcomes → mine recurring patterns → human approves → suppress noise + inject rules
findingSignature, mineLearnings, applyLearnings, learningsStore + review:feedback/review:learn scripts
Shared knowledge committed (feedback.jsonl, learnings.yml); transients gitignored

Implementation notes

CommonJS engine (Yarn 4 PnP stable path); tested via a single-process node:test runner (yarn test:review) — 51 tests pass
New devDeps: yaml, minimatch, ajv (tooling only)
Design specs + plans under docs/superpowers/
lint:ts failures in CI are pre-existing app codegen gaps (the .claude/review engine is outside tsconfig scope)

🤖 Built with Claude Code (brainstorm → spec → plan → subagent-driven implementation).

github-actions · 2026-06-23T19:05:33Z

Bundle sizes [mpdx-react]

Compared against 63a8a1c

Route	Size (gzipped)	Diff
`/accountLists/[accountListId]/hrTools/partnerReminders`	322.84 KB	-1.07 KB
`/accountLists/[accountListId]/hrTools/staffSavingFund/transfers`	327.7 KB	-1.73 KB
`/accountLists/[accountListId]/reports/mpgaIncomeExpenses`	392.08 KB	-1.38 KB
`/accountLists/[accountListId]/reports/staffExpense`	358.81 KB	-1.46 KB
`/accountLists/[accountListId]/hrTools/mpdSupervisorReport`	no change	removed

Dynamic import	Size (gzipped)	Diff
`src/components/HrTools/MpdSupervisorReport/StaffDetailsTabs/MPGA/DynamicMPGA.tsx -> ./MPGA`	no change	removed
`src/components/HrTools/MpdSupervisorReport/StaffDetailsTabs/MonthlySummary/DynamicMonthlySummary.tsx -> ./MonthlySummary`	no change	removed
`src/components/HrTools/MpdSupervisorReport/StaffDetailsTabs/Payroll/DynamicPayroll.tsx -> ./Payroll`	no change	removed
`src/components/HrTools/MpdSupervisorReport/StaffDetailsTabs/Quarterly/DynamicQuarterly.tsx -> ./Quarterly`	no change	removed
`src/components/HrTools/MpdSupervisorReport/StaffDetailsTabs/StaffExpenseReport/DynamicStaffExpenseReport.tsx -> ./StaffExpenseReport`	no change	removed

…back/learn/learnings/approve/reject)

dr-bizz · 2026-06-25T17:56:47Z

🤖 AI multi-agent dogfood review — the reviewer reviewing its own PR. Engine pre-flight selected the panel; 4 substantive specialists (security, architecture, testing, standards) ran on the engine/config/command code. 22 findings (severities modest, mostly 4–6). Advisory — and a deliberate self-test.

PR #1858 Dogfood Review: Agent-Review Engine (Reviewer Reviewing Itself)

Executive summary

The agent-review engine is a well-built, unusually well-tested piece of multi-repo review tooling: cleanly layered pure-core modules, 58 passing node:test cases, consistent CommonJS, escaped regexes, and shell-free execFileSync subprocess calls (no command injection, no ReDoS, no secret leakage found). The substantive issues are not in the pure scoring core but at the seams: two parallel entry points and three arg parsers drift, the refactored command gates engine features by grepping raw YAML instead of the parsed config, and yarn review run hardcodes a scope that inflates its preflight risk relative to the real review path. Most security findings are lower-severity hardening items; the one structural trust-boundary concern is the auto-fix design (LLM-emitted shell run via apply_all.sh).

BLOCKING

1. Review command runs model-emitted bash (sed auto-fixes) via apply_all.sh (sev 5, security)
.claude/commands/agent-review.md:403-412, 1284-1297 — Agents write sed -i scripts derived from attacker-influenceable diff/file content to /tmp/automated_fixes/*.sh, then apply_all.sh auto-loops bash "$fix" over them. Untrusted diff -> generated shell -> executed shell. Fix: gate each fix behind explicit human confirmation, constrain edits to the changed-file set, prefer git apply of structured patches over free-form sed, never auto-run; document the scripts as untrusted.

2. Agent over-selection from trigger keywords inside the reviewer's own rule docs (sev 6, dogfood — see meta section)
data-integrity, ux, financial agents auto-selected because content:mutation / content:<Formik / content:amount triggers appear in .claude/review/rules/*.md prose, not code. Fix: exclude .claude/review/rules/** and doc markdown from content-trigger scanning, or scan only added code lines.

3. yarn review run hardcodes scope=multi_feature, diverging from the real risk score (sev 6, architecture)
.claude/review/cli.cjs:108 — buildPlan is called with scope:'multi_feature' (1.3x multiplier) while agent-review.md Stage 0 and plan.cjs:35 default to single_feature (1.0x). The same diff can land in a different LOW/MEDIUM/HIGH/CRITICAL band depending on entry point, contradicting "engine is the single source of truth" (agent-review.md:172). Fix: default run to single_feature, or accept --scope and thread it through; centralize the default in one constant.

4. Two parallel CLI dispatchers with duplicated logic and divergent yarn wiring (sev 5, architecture)
.claude/review/cli.cjs:24-37, 69-95 — Flows exist both in cli.cjs and in require.main blocks of learningsStore.cjs/indexStore.cjs/impact.cjs/plan.cjs; package.json wires some scripts to cli.cjs and others to the per-module CLIs (two arg conventions). emit/filter learning steps are reachable only via the module CLI, never the run orchestrator. Fix: make cli.cjs the only dispatcher, reduce module require.main blocks to thin shims, point all yarn scripts at cli.cjs, and extract one shared changedFiles/parseArgs.

5. Command gates engine features by grepping raw config.yml instead of parsed config (sev 5, architecture)
.claude/commands/agent-review.md:292, 1194 — grep -q "enabled: true" is unscoped (matches both index: and learning: blocks), and grep -q "learning:" matches key presence not learning.enabled. The engine already exposes parsed cfg.index.enabled/cfg.learning.enabled (cli.cjs:110). A formatting change silently breaks gating. Fix: surface gate decisions from the engine (e.g. yarn review config get index.enabled, or indexEnabled/learningEnabled booleans in plan JSON) and branch on those.

6. plan.cjs parseArgs is buggy on value-less flags and untested (sev 4, testing)
.claude/review/engine/plan.cjs:17-21 — Fixed 2-stride loop mis-pairs every arg after any boolean flag (parseArgs(['--verbose','--config','c.yml']) yields {verbose:'--config'}, drops the config value). impact.cjs has a second, correct, tested parseArgs; plan.test.cjs never tests its own. Latent regression: any future --no-impact-style flag corrupts parsing silently. Fix: consolidate on the impact.cjs parser (or a shared helper) and add a plan test asserting a boolean flag doesn't consume the next token.

7. User-controlled --base ref interpolated into git args without a leading-dash guard (sev 4, security)
.claude/review/cli.cjs:30, 102-103 — --base value flows into git diff <base>...HEAD via execFileSync (argv array, so no injection today), but is unvalidated; any future refactor passing b as its own arg exposes argument injection (e.g. --upload-pack). Fix: validate against /^[A-Za-z0-9._\/~^-]+$/, reject leading -, and prefer the explicit git diff <a> <b> / -- form.

8. flag() consumes the following flag token as a value (sev 4, standards; related to #6/#7)
.claude/review/cli.cjs:22 — flag() returns argv[i+1] unconditionally, so yarn review run --base --no-launch sets base='--no-launch', and a trailing --base returns undefined silently. Fix: return undefined when the next token is missing or starts with --, and surface a usage error. (Same fragility class as plan.cjs/impact.cjs parsers, #6.)

9. Index path is hardcoded, ignoring config.index.path (sev 4, standards)
.claude/review/cli.cjs:17,36,110 — config.yml defines index.path and the schema models it, but cli.cjs hardcodes join(RD,'index'); changing the config path silently builds/reads the index at the wrong location. The config field is dead. Fix: derive the path from cfg.index.path (with default fallback) in loadIndex()/run, or drop the field from config+schema.

IMPORTANT

10. run re-implements linesChanged parsing instead of the exported helper (sev 5, standards; DRY)
.claude/review/cli.cjs:104-106 — Inline insertions/deletions regexes duplicate plan.cjs's already-exported linesChangedFromStat(). Fix: import and call linesChangedFromStat(stat) (already importing buildPlan from the same module).

11. Numeric CLI flags accept NaN with no validation, fail silently (sev 5, standards)
.claude/review/cli.cjs:77 (--min-support), impact.cjs:30-31 (--max-depth/--max-nodes) — Number('abc')===NaN flows into mineLearnings where ?? 3 doesn't catch NaN, so every total < NaN is false and zero proposals mine with no error. Fix: guarded parse — reject non-positive-integer with a usage error and exit 1.

12. loadConfig's throw-on-invalid path is never tested (sev 3, testing)
.claude/review/engine/loadConfig.test.cjs:31-36 — Only the pure validateConfig() is asserted; loadConfig()'s disk-read, YAML-parse-then-validate, and throw new Error('Invalid review config...') formatting are uncovered. Fix: write a temp invalid config to a tmpdir and assert.throws(() => loadConfig(...), /Invalid review config/); add a malformed-YAML case.

13. CLI command dispatch largely untested (sev 3, testing)
.claude/review/cli.cjs:50-134 — Only help/empty/unknown branches covered; config/index/impact/feedback/learn/learnings/approve/reject/run arg-handling and exit codes (e.g. feedback/approve/reject with no arg returning 1; --no-launch) are not. Fix: drive main() for the no-arg exit-1 branches (pure, no git needed); for repo-dependent commands point ROOT at a tmp git repo or make git/path constants injectable.

14. learningsStore CLI and file-I/O helpers untested (sev 3, testing)
.claude/review/engine/learningsStore.cjs:57-93 — loadLearnings/saveLearnings, loadFeedback/appendFeedback, and the whole --emit/--ingest/--mine/--rules/--filter CLI block are uncovered; the full emit->ingest->mine->approve->filter loop is never exercised end-to-end. Fix: tmpdir round-trip tests (save/load, append/load) plus one emit->ingest->mine integration test asserting a proposal is produced.

15. Config/learnings rule-doc paths read and injected into prompts without a path-traversal guard (sev 3, security)
.claude/commands/agent-review.md:275-300 — Rule paths from config.yml/learnings are typed as plain strings (no schema pattern); a ../../.env-style entry would be read and injected into a prompt. Trust boundary is human-ratified config, but there's no defense-in-depth. Fix: constrain rule paths in config.schema.json to ^rules/[A-Za-z0-9._-]+\.md$ and verify each resolves within .claude/review/rules/ before reading.

16. Bad/shallow git state crashes commands with a raw error (sev 3, architecture)
.claude/review/cli.cjs:24-33, 102-103 — When merge-base and HEAD~1 both fail (shallow clone, single commit, detached HEAD), the git diff throws and the top-level catch prints raw git stderr — inconsistent with the friendly loadConfig/usage messaging. Fix: wrap the diff/merge-base calls and emit a directed message ("could not determine a diff base; pass --base <ref>") with non-zero exit.

MINOR

17. run mode inferred by exclusion, not validated (sev 3, standards/security — flagged by standards+security)
.claude/review/cli.cjs:100 (and 99-119) — Any non-flag token becomes mode and is forwarded verbatim into claude -p "/agent-review ${mode}"; a typo silently launches an unknown mode and the preflight summary disagrees with the launched intent. Fix: validate against ['quick','standard','deep'] and warn/reject otherwise.

18. Lazy require() of fs/os mid-run-case breaks the require convention (sev 3, standards)
.claude/review/cli.cjs:97-98 — require('node:fs')/require('node:os') mid-function; no lazy-load benefit in a tiny CLI. Fix: add writeFileSync to the top-level node:fs destructure and hoist node:os.

19. JSON.parse on per-line feedback / arbitrary --in files without try/catch (sev 2, security)
.claude/review/engine/learningsStore.cjs:48, 66, 88 — One malformed feedback.jsonl line throws an unhandled SyntaxError and halts the learning pipeline. Fix: wrap per-line parse in try/catch to skip+count malformed lines; validate parsed shape. (Robustness overlap with testing finding #14.)

20. Unvalidated mode flows into the launched claude prompt (sev 2, security) — same root cause as #17; covered there.

21. Empty-diff / zero-files edge uncovered for scoreRisk/selectAgents (sev 2, testing)
.claude/review/engine/scoreRisk.test.cjs:33-63 — Behavior is correct today (score 0/LOW; only always-on agents) but unpinned. Fix: one assertion each pinning the no-op case.

22. detectSpecial negative/boundary cases untested (sev 2, testing)
.claude/review/engine/detectSpecial.test.cjs:10-32 — Only positive detections asserted; missing deletion-only package.json (must NOT flag new_dependency), lockfile+pkg together (must NOT flag lockfile_only_change), next.config without security keywords, and an adversarial-package-name escapeRe assertion. Fix: add negative-assertion cases per guard plus one escapeRe test with regex metachars.

23. Language/alias support hardcoded across resolveImport and indexStore (sev 2, architecture)
.claude/review/engine/resolveImport.cjs:4-5 (+ indexStore.cjs:7 INDEX_RE) — ALIASES/EXTS and INDEX_RE must be hand-kept consistent; adding a tsconfig @/ mapping or new extension means editing engine source, breaking the otherwise config-driven model. Fix: promote alias roots/extensions into config.yml's index: block (or read tsconfig paths) and have both modules consume them.

Dogfood / self-review meta-findings

Running the reviewer's own engine on this PR surfaced two configuration/scoping bugs that the engine's design invites when the thing under review is the engine itself (or any repo where rule/doc prose overlaps with code-trigger vocabulary):

Risk scored 275/CRITICAL — inflated (sev ~5). The config marks ALL of .claude/** as critical (+3 each), so each of the ~37 engine files counts as critical and is then amplified by the cross-cutting multiplier. The real risk of a self-contained tooling PR is far lower; the band is meaningless here. Fix: scope the .claude/** critical pattern to harness-integrity files only (commands/, rules/, settings), not the engine source.
Agent over-selection from rule-doc prose (sev ~6). data-integrity (content:mutation), ux (content:<Formik), and financial (content:amount) were auto-selected because those trigger keywords appear inside the reviewer's own rules/*.md documentation text, not in any code. Fix: exclude .claude/review/rules/** and doc markdown from content-trigger scanning, or scan only added code lines. (This is the same root cause as BLOCKING contacts #2.)

Both meta-findings share a theme the specialists corroborated: the engine's file-set and trigger definitions are not context-aware (they treat doc markdown like source) and its critical-path config is coarse (.claude/** wholesale) — consistent with the architecture finding (#23) that the file-set/trigger model is hardcoded and not config-discriminating, and with the architecture concern that scope/scoring inputs drift across entry points (#3).

What's solid

Pure core is clean and correct. No findings in scoreRisk/selectAgents/resolveRules/detectSpecial/queryImpact — well-bounded, plain-args-in/data-out, IO isolated into loadConfig/indexStore/learningsStore.
Security posture is strong. All subprocess calls use execFileSync with argv arrays (no shell, no command injection); package-name regex is properly escaped; import/diff regexes are linear (no ReDoS); no secret leakage in logs. Index cache is gitignored and HEAD-keyed; learnings.yml is committed (correct human-ratified trust model).
Unusually well-tested for review tooling. 58 node:test cases, one .test.cjs per module, real assertions with good edge coverage in the highest-value modules (scoreRisk exclusions/rounding, queryImpact truncation, indexStore cache reuse + head-mismatch rebuild, findingSignature normalization, mineLearnings support thresholds, learningsStore status-preservation). Verified the PnP single-process runner correctly propagates non-zero exit codes on Node 24, so yarn test:review fails CI as intended.
Clean standards + layering. Consistent CommonJS ('use strict', node: prefixes, named exports), clean naming, no console/debugger/TODO noise, informative errors, deps correctly declared. buildPlan is a tidy composition seam; the agent-review.md refactor genuinely consumes the engine instead of duplicating risk math inline.

- content triggers skip prose (.md), the reviewer's own config/rule definition files, and package-manager artifacts (.yarn/.pnp/lockfiles) — fixes agent over-selection - scope .claude/** critical pattern to harness-integrity files; engine source is ordinary code (risk de-inflated 275->82 on this PR) - plan/impact parseArgs no longer swallow the token after a boolean flag; flag() rejects flag-as-value; NaN guards on --min-support/--max-depth/--max-nodes - review run: scope default single_feature (+ --scope), index.path from config, linesChangedFromStat reused, mode validated, --base ref validated, friendly git errors - config get <dot.path> subcommand; loadFeedback skips malformed JSONL - config.schema.json constrains rule paths; + regression tests (65 pass)

- #23 config-driven import graph: resolveImport/buildGraph/indexStore take aliases/ extensions/roots from config.index (defaults preserved); schema + tests - #4 single arg parser (engine/args.cjs) used by plan/impact/indexStore/learningsStore (no more 4 divergent parsers); review:* scripts repointed to the cli.cjs dispatcher; cli index --force - #5 command gates index/learning via 'review config get', not raw grep of config.yml - #1 auto-fix apply_all.sh now DRY-RUNS untrusted model-generated scripts; applies only with --yes - 68 tests pass

… change

Resolve code-review.md conflict; prettier-format the review engine files. Claude-Session: https://claude.ai/code/session_01Q2jTiajZG4YmVb6ovhmn9V

dr-bizz added 29 commits June 23, 2026 12:08

docs: competitive research, spec, and plan for review config layer

94ae1d6

docs: revise plan for CommonJS engine + Yarn PnP constraints

9ef58de

feat(review): scaffold config engine deps, runner + JSON schema

8a4f986

feat(review): config loader + ajv validation

b0e119c

feat(review): config-driven risk scorer

3bfac9d

feat(review): config-driven agent selection

8f73e31

feat(review): rule resolver (agent + path rules)

d5c60ce

feat(review): deterministic special-pattern detection

2c8c0bf

feat(review): plan CLI entry assembling risk + agents + rules

1044a06

feat(review): author config.yml migrated from code-review.md

52f64b6

feat(review): migrate prose focus-areas into rules/*.md

433cc43

refactor(review): drive risk + agent selection from config engine

eeecbe4

chore(review): supersede code-review.md with pointer to review core

3365548

build(review): commit PnP map for yaml/minimatch/ajv (Zero-Install)

b9faba0

docs: spec for agent-review index layer (Gap 2 / Phase B)

8d019c9

docs: implementation plan for agent-review index layer (Gap 2)

e7d618b

feat(review): import specifier resolver for index graph

25cb249

feat(review): file-level import graph builder

eda56d5

feat(review): transitive impact query over import graph

7622a98

feat(review): HEAD-keyed import-graph cache (indexStore)

bbc2d37

feat(review): impact CLI emitting dependents report

0e4579e

feat(review): wire impact analysis into agent-review Stage 1B

5a4719a

docs: spec for agent-review learning layer (Gap 3 / Phase C)

a4df948

docs: implementation plan for agent-review learning layer (Gap 3)

0bc25ce

feat(review): stable finding signature for learning

22b21f1

feat(review): mine feedback into proposed learnings

ecff08e

feat(review): apply approved learnings (suppress + rule injection)

2fd74c3

feat(review): learnings store + emit/ingest/mine/rules/filter CLI

08f52f3

feat(review): enable learning layer + wire feedback/learn into command

ca8489e

dr-bizz added 7 commits June 23, 2026 15:18

docs: spec for agent-review CLI (Phase D)

266509a

docs: implementation plan for agent-review CLI (Phase D)

d4ad437

feat(review): pure CLI helpers (status, list, preflight)

4398e6d

feat(review): unified review CLI dispatcher (config/index/impact/feed…

07fd592

…back/learn/learnings/approve/reject)

feat(review): review run pre-flight + claude -p launch

e374025

docs: spec for agent-review plugin distribution (Phase E)

0e9a16a

docs: implementation plan for agent-review plugin distribution (Phase E)

71ea930

dr-bizz added 4 commits June 25, 2026 14:07

ci: run review engine tests (yarn test:review) only when review files…

bbdbc5e

… change

Merge origin/main into review-config-layer

57e5e55

Resolve code-review.md conflict; prettier-format the review engine files. Claude-Session: https://claude.ai/code/session_01Q2jTiajZG4YmVb6ovhmn9V

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Agent-Review Core: config + index + learning layers#1858

Agent-Review Core: config + index + learning layers#1858
dr-bizz wants to merge 40 commits into
mainfrom
review-config-layer

dr-bizz commented Jun 23, 2026

Uh oh!

github-actions Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

dr-bizz commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dr-bizz commented Jun 23, 2026