Feature Request: Enrich Stage 10 cross-patch context to reduce false positives

## Problem

Sashiko's review pipeline reviews each patch in isolation. **Stages 1–7 have zero visibility into sibling patches.** Stage 10 is the only stage with cross-patch awareness. It has access to git tools (read files, diff, show, grep, etc.) and could theoretically inspect subsequent patches' code. However, the only cross-patch information *injected into its prompt* is the **subject lines** of subsequent patches.

This creates a systematic class of false positives: findings on patch N that are actually resolved or implemented by a later patch M in the same series. Three common patterns:

1. **Stub + Implementation split**: Patch 1 introduces a function stub; patch 3 implements it. Review of patch 1 reports "missing implementation" — which is intentional, not a bug.

2. **Transient bug + Fix**: Patch 3 introduces a race condition during refactoring; patch 5 fixes it. Review of patch 3 reports the bug — which doesn't exist in the patchset's final state. This is noise for maintainers.

3. **Incremental construction**: A series builds up a feature step by step (parse → validate → integrate). Each intermediate patch appears "incomplete" because the next patch hasn't been applied yet. Isolated review treats every incremental step as buggy.

Stage 10 is designed to catch these cases — it has a validation step that checks whether concerns are addressed by subsequent patches, and it has git tools to inspect the code. But with only subject lines as starting cues, its catch rate is effectively zero (1.6%, see data below). The issue is not that Stage 10 lacks tools; it's that subject lines don't provide enough signal for the LLM to know *what to look up* with those tools.

## Data

Analysis of recent 1,271 reviewed patchsets (≥4 patches, with findings) from the sashiko.dev API:

| Metric | Value |
|--------|-------|
| Total findings | 13,790 |
| Cross-patch FP candidates | 3,534 (**25.6%** of all findings) |
| Patchsets affected | 195 (**15.3%** of analyzed patchsets) |
| Non-preexisting FPs | 2,386 (67.5% of candidates — AI flagged them as newly introduced) |
| Stage 10 cross-patch catch rate | 57 / 3,534 (**1.6%**) |

**Severity distribution**: 79.6% of cross-patch FP candidates are High or Critical — the findings that waste the most maintainer time.

**Patch distance**: 54.7% of FPs are within ±2 patches; 24.7% are in adjacent patches.

## Examples

1. **[PS #26528](https://sashiko.dev/#/patchset/26528)** (BPF, 5 patches): Patch 2 parses BTF type tags in `kernel/bpf/btf.c`, review reports 4 High + 1 Critical findings — all about missing arena pointer support (function stops processing type chain too early, unrecognized tags cause validation failure). Patch 3's subject is "bpf: Allow subprogs to return arena pointers" — it implements exactly the feature Patch 2 "lacks". All 5 findings are false positives from the patchset perspective.

2. **[PS #28441](https://sashiko.dev/#/patchset/28441)** (V3D DRM, 14 patches): Patch 3 decouples DMA fence lock from queue lock, review reports a Critical UAF in `v3d_overflow_mem_work()`. Patch 4 replaces the lock implementation entirely, likely fixing the race. The subject "drm/v3d: Replace spin_lock_irqsave() with spin_lock()" doesn't clearly indicate a fix — but the commit message would. The Critical finding is noise for maintainers.

3. **[PS #28386](https://sashiko.dev/#/patchset/28386)** (i915, 8 patches): Incremental VBT parsing series where each patch in `intel_bios.c` adds a piece of functionality. Patch 5 reports 2 High findings (OOB read), Patch 6 reports 3 more (OOB read, missing bounds check), Patch 7 has findings too. Each patch's bounds check issue is resolved by the next. The developer split the work logically (parse → validate → integrate); isolated review punishes each step for being incomplete.

## Proposed Solution

Enrich the information available to Stage 10's cross-patch validation step. Two complementary approaches, ordered by increasing richness:

### Approach 1: Subject → Commit Message

Currently, `calculate_series_range()` in `src/worker/prompts.rs` provides sibling patch info via `git log --reverse --format=%s`. Change to include the commit message body:

```
git log --reverse --format="Subject: %s\n\n%b"
```

**What this adds**: Commit messages typically explain *why* a change is made and *what* it does. For many patches, the body explicitly mentions "fix the race condition introduced earlier" or "implement the feature stub from the previous patch."

**Implementation**: Single-line change in `calculate_series_range()`. Minimal token cost increase — commit messages are typically 5–20 lines.

### Approach 2: Pre-generate patchset-wide summaries, inject into Stage 10

Add a **pre-processing step** that runs *before* the per-patch review pipeline (Stages 1–7): for every patch in the patchset, generate a short summary via a lightweight LLM call. These summaries are stored as patchset metadata, then injected into Stage 10 as global context.

**Pipeline flow changes from**:

```
[for each patch] → Stages 1–7 (parallel) → Stage 8–11 (sequential)
```

**to**:

```
[for each patch] → Summary generation → Stages 1–7 (parallel) → Stage 8–11 (sequential, with summaries injected into Stage 10)
```

**Summary format**:

> Patch N/M: `<files>` — [adds|removes|modifies] `<description of what the patch does>`. Notably does **not** `<description of what the patch deliberately leaves for later>`.

The "notably does not" field is critical — it makes the patch's deliberate incompleteness explicit, which is exactly the signal Stage 10 needs to identify cross-patch FPs.

**What this adds**: A structured, patchset-wide representation of each patch's intent — specifically including what it *doesn't* do. This directly addresses the three cross-patch FP patterns:
- "Stub + implementation": summary says "does not implement the handler" → Stage 10 sees a later patch's summary says "implements the handler"
- "Transient bug + fix": summary says "refactors lock, does not update irqsave callers" → Stage 10 sees a later patch's summary says "replaces spin_lock_irqsave with spin_lock"
- "Incremental construction": summary says "adds parsing, does not validate bounds" → Stage 10 sees a later patch's summary says "adds bounds validation for parsed values"


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Enrich Stage 10 cross-patch context to reduce false positives #249

Problem

Data

Examples

Proposed Solution

Approach 1: Subject → Commit Message

Approach 2: Pre-generate patchset-wide summaries, inject into Stage 10

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Value
Total findings	13,790
Cross-patch FP candidates	3,534 (25.6% of all findings)
Patchsets affected	195 (15.3% of analyzed patchsets)
Non-preexisting FPs	2,386 (67.5% of candidates — AI flagged them as newly introduced)
Stage 10 cross-patch catch rate	57 / 3,534 (1.6%)

Uh oh!

Feature Request: Enrich Stage 10 cross-patch context to reduce false positives #249

Description

Problem

Data

Examples

Proposed Solution

Approach 1: Subject → Commit Message

Approach 2: Pre-generate patchset-wide summaries, inject into Stage 10

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions