Skip to content

Feature Request: Enrich Stage 10 cross-patch context to reduce false positives #249

Description

@m0ck1ng

Problem

Sashiko's review pipeline reviews each patch in isolation. Stages 1–7 have zero visibility into sibling patches. Stage 10 is the only stage with cross-patch awareness. It has access to git tools (read files, diff, show, grep, etc.) and could theoretically inspect subsequent patches' code. However, the only cross-patch information injected into its prompt is the subject lines of subsequent patches.

This creates a systematic class of false positives: findings on patch N that are actually resolved or implemented by a later patch M in the same series. Three common patterns:

  1. Stub + Implementation split: Patch 1 introduces a function stub; patch 3 implements it. Review of patch 1 reports "missing implementation" — which is intentional, not a bug.

  2. Transient bug + Fix: Patch 3 introduces a race condition during refactoring; patch 5 fixes it. Review of patch 3 reports the bug — which doesn't exist in the patchset's final state. This is noise for maintainers.

  3. Incremental construction: A series builds up a feature step by step (parse → validate → integrate). Each intermediate patch appears "incomplete" because the next patch hasn't been applied yet. Isolated review treats every incremental step as buggy.

Stage 10 is designed to catch these cases — it has a validation step that checks whether concerns are addressed by subsequent patches, and it has git tools to inspect the code. But with only subject lines as starting cues, its catch rate is effectively zero (1.6%, see data below). The issue is not that Stage 10 lacks tools; it's that subject lines don't provide enough signal for the LLM to know what to look up with those tools.

Data

Analysis of recent 1,271 reviewed patchsets (≥4 patches, with findings) from the sashiko.dev API:

Metric Value
Total findings 13,790
Cross-patch FP candidates 3,534 (25.6% of all findings)
Patchsets affected 195 (15.3% of analyzed patchsets)
Non-preexisting FPs 2,386 (67.5% of candidates — AI flagged them as newly introduced)
Stage 10 cross-patch catch rate 57 / 3,534 (1.6%)

Severity distribution: 79.6% of cross-patch FP candidates are High or Critical — the findings that waste the most maintainer time.

Patch distance: 54.7% of FPs are within ±2 patches; 24.7% are in adjacent patches.

Examples

  1. PS #26528 (BPF, 5 patches): Patch 2 parses BTF type tags in kernel/bpf/btf.c, review reports 4 High + 1 Critical findings — all about missing arena pointer support (function stops processing type chain too early, unrecognized tags cause validation failure). Patch 3's subject is "bpf: Allow subprogs to return arena pointers" — it implements exactly the feature Patch 2 "lacks". All 5 findings are false positives from the patchset perspective.

  2. PS #28441 (V3D DRM, 14 patches): Patch 3 decouples DMA fence lock from queue lock, review reports a Critical UAF in v3d_overflow_mem_work(). Patch 4 replaces the lock implementation entirely, likely fixing the race. The subject "drm/v3d: Replace spin_lock_irqsave() with spin_lock()" doesn't clearly indicate a fix — but the commit message would. The Critical finding is noise for maintainers.

  3. PS #28386 (i915, 8 patches): Incremental VBT parsing series where each patch in intel_bios.c adds a piece of functionality. Patch 5 reports 2 High findings (OOB read), Patch 6 reports 3 more (OOB read, missing bounds check), Patch 7 has findings too. Each patch's bounds check issue is resolved by the next. The developer split the work logically (parse → validate → integrate); isolated review punishes each step for being incomplete.

Proposed Solution

Enrich the information available to Stage 10's cross-patch validation step. Two complementary approaches, ordered by increasing richness:

Approach 1: Subject → Commit Message

Currently, calculate_series_range() in src/worker/prompts.rs provides sibling patch info via git log --reverse --format=%s. Change to include the commit message body:

git log --reverse --format="Subject: %s\n\n%b"

What this adds: Commit messages typically explain why a change is made and what it does. For many patches, the body explicitly mentions "fix the race condition introduced earlier" or "implement the feature stub from the previous patch."

Implementation: Single-line change in calculate_series_range(). Minimal token cost increase — commit messages are typically 5–20 lines.

Approach 2: Pre-generate patchset-wide summaries, inject into Stage 10

Add a pre-processing step that runs before the per-patch review pipeline (Stages 1–7): for every patch in the patchset, generate a short summary via a lightweight LLM call. These summaries are stored as patchset metadata, then injected into Stage 10 as global context.

Pipeline flow changes from:

[for each patch] → Stages 1–7 (parallel) → Stage 8–11 (sequential)

to:

[for each patch] → Summary generation → Stages 1–7 (parallel) → Stage 8–11 (sequential, with summaries injected into Stage 10)

Summary format:

Patch N/M: <files> — [adds|removes|modifies] <description of what the patch does>. Notably does not <description of what the patch deliberately leaves for later>.

The "notably does not" field is critical — it makes the patch's deliberate incompleteness explicit, which is exactly the signal Stage 10 needs to identify cross-patch FPs.

What this adds: A structured, patchset-wide representation of each patch's intent — specifically including what it doesn't do. This directly addresses the three cross-patch FP patterns:

  • "Stub + implementation": summary says "does not implement the handler" → Stage 10 sees a later patch's summary says "implements the handler"
  • "Transient bug + fix": summary says "refactors lock, does not update irqsave callers" → Stage 10 sees a later patch's summary says "replaces spin_lock_irqsave with spin_lock"
  • "Incremental construction": summary says "adds parsing, does not validate bounds" → Stage 10 sees a later patch's summary says "adds bounds validation for parsed values"

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions