From 49fe07dca719debface8963a70f87e402a7ebba5 Mon Sep 17 00:00:00 2001
From: Justin McLean <justin@classsoftware.com>
Date: Mon, 29 Jun 2026 16:50:07 +1000
Subject: [PATCH] update specs and implementation plan

---
 tools/spec-loop/IMPLEMENTATION_PLAN.md        | 479 ++++++++++++++----
 tools/spec-loop/specs/adapters.md             |  20 +
 tools/spec-loop/specs/adoption-and-setup.md   |  13 +
 .../specs/meta-and-quality-tooling.md         |  48 +-
 .../spec-loop/specs/organization-adapters.md  |  10 +
 tools/spec-loop/specs/pr-management-family.md |  10 +-
 tools/spec-loop/specs/privacy-llm-gate.md     |  12 +
 tools/spec-loop/specs/project-agnosticism.md  |  13 +
 .../specs/release-management-lifecycle.md     |  12 +
 9 files changed, 501 insertions(+), 116 deletions(-)

diff --git a/tools/spec-loop/IMPLEMENTATION_PLAN.md b/tools/spec-loop/IMPLEMENTATION_PLAN.md
index afad7032..77929679 100644
--- a/tools/spec-loop/IMPLEMENTATION_PLAN.md
+++ b/tools/spec-loop/IMPLEMENTATION_PLAN.md
@@ -30,9 +30,10 @@ one PR** (the branch-per-feature constraint).
   `pairing-multi-agent-review` (three independent axis passes; eval
   suites present); `docs/modes.md` Agentic Pairing row reflects 2 skills /
   `experimental`. Spec: [`specs/pairing-mode.md`](specs/pairing-mode.md).
-- **Agentic Mentoring — both skills shipped** — `pr-management-mentor` and
-  `good-first-issue-author` (eval suites present); `docs/modes.md`
-  Agentic Mentoring row reflects 2 skills / `experimental`.
+- **Agentic Mentoring — four skills shipped** — `pr-management-mentor`,
+  `good-first-issue-author`, `mentoring-welcome`, and
+  `contributor-to-committer` (eval suites present); `docs/modes.md`
+  Agentic Mentoring row reflects 4 skills / `experimental`.
   Spec: [`specs/mentoring-mode.md`](specs/mentoring-mode.md).
 - **Contributor skills** — `contributor-nomination`,
   `contributor-activity-sweep`, and `committer-onboarding` shipped with
@@ -53,43 +54,58 @@ one PR** (the branch-per-feature constraint).
   has `pyproject.toml`, `src/`, and a `tests/` directory with pytest
   coverage for the sandbox profiles and clean-env wrapper.
   Spec: [`specs/agent-isolation-sandbox.md`](specs/agent-isolation-sandbox.md).
-- **Eval coverage — complete** — 60 skill eval suites exist in
-  `tools/skill-evals/evals/`, covering all skills including the full
-  setup-family (setup, setup-isolated-setup-doctor,
+- **Eval coverage — complete** — every current `skills/*/SKILL.md` has a
+  matching eval suite in `tools/skill-evals/evals/`; the eval catalogue also
+  includes non-skill smoke suites such as `non-asf-profile-smoke`. Coverage
+  includes the full setup-family (setup, setup-isolated-setup-doctor,
   setup-isolated-setup-install, setup-isolated-setup-update,
   setup-isolated-setup-verify, setup-override-upstream,
   setup-shared-config-sync).
-- **Release-management — first four skills shipped** —
-  `release-vote-draft`, `release-announce-draft`, `release-vote-tally`,
-  and `release-verify-rc` landed with eval suites (formerly planned work
-  items 1–2 plus two follow-ups). Six `release-*` skills remain; see
+- **Release-management family complete** — all ten `release-*` skills landed
+  with eval suites; no release-management skill remains proposed. See
   [`specs/release-management-lifecycle.md`](specs/release-management-lifecycle.md).
 - **Agentic Triage — general-issue family filled out** — `issue-stale-sweep`,
   `issue-deduplicate`, and `issue-backlog-stats` shipped with eval suites
-  (formerly planned work item 3 plus its deferred siblings).
+  (formerly planned general-issue triage work plus its deferred siblings).
   Spec: [`specs/triage-mode.md`](specs/triage-mode.md).
-- **Agentic Mentoring — first-contribution welcome shipped** — `mentoring-welcome`
-  landed with an eval suite (formerly planned work item 4).
-  Spec: [`specs/mentoring-mode.md`](specs/mentoring-mode.md).
+- **Contributor-to-committer readiness shipped** — the mentoring-family
+  `contributor-to-committer` readiness tracker landed with an eval suite and
+  is documented in the contributor-growth and mentoring family docs.
+  Spec: [`specs/contributor-growth.md`](specs/contributor-growth.md).
 - **Project-agnosticism — ASF-coupling advisory lint shipped** — the SOFT
   ASF-coupling category landed in `tools/skill-and-tool-validator`
   (formerly planned work item 5), and `drafting-mode.md` Known Gaps is
   synced to the shipped drafting skills (formerly planned work item 6).
-- **Project-agnosticism — capability-flag vocabulary enumerated** — the
+- **Project-agnosticism — capability-flag vocabulary and wiring advanced** — the
   contributor/committer-intake (ICLA vs DCO), security-intake, and
   CVE-allocation option sets and defaults are enumerated as
   `projects/_template/committer-onboarding-config.md`,
   `security-intake-config.md`, and `cve-allocation-config.md`, following
-  the backend-flag precedent in `release-management-lifecycle.md`. Wiring
-  the skills to read these flags is tracked as work item 3.
+  the backend-flag precedent in `release-management-lifecycle.md`.
+  `security-issue-import`, `security-issue-sync`, and `committer-onboarding`
+  have begun reading those flags; remaining adopter-pilot feedback is tracked
+  in the specs, not as an immediate build item here.
   Spec: [`specs/project-agnosticism.md`](specs/project-agnosticism.md).
-- **Repo-health — three-skill family shipped** — `ci-runner-audit`,
-  `workflow-security-audit`, and `dependency-audit` landed (read-only,
-  `experimental`). Spec: [`specs/repo-health-family.md`](specs/repo-health-family.md).
-- **New proposed specs awaiting their first build item** —
-  [`specs/reviewer-routing.md`](specs/reviewer-routing.md) (Agentic Triage) and
-  [`specs/skill-reconciler.md`](specs/skill-reconciler.md) (infra) are
-  documented spec-first; their build items are below.
+- **Repo-health family complete** — `ci-runner-audit`,
+  `workflow-security-audit`, `dependency-audit`, `license-compliance-audit`,
+  and `flaky-test-triage` landed (read-only, `experimental`).
+  Spec: [`specs/repo-health-family.md`](specs/repo-health-family.md).
+- **Reviewer routing shipped** — `reviewer-routing` landed with an eval suite,
+  filling the first reviewer-routing spec build item. Remaining work is spec /
+  docs cleanup for the shipped state and later adopter-pilot feedback.
+  Spec: [`specs/reviewer-routing.md`](specs/reviewer-routing.md).
+- **Skill reconciler shipped** — `skill-reconciler` landed with an eval suite,
+  implementing the cross-project comparison workflow. Follow-on gaps are the
+  optional deterministic structural-diff helper and source-tag auto-pairing,
+  both deferred. Spec: [`specs/skill-reconciler.md`](specs/skill-reconciler.md).
+- **Project-agnosticism cleanup shipped** — high-confidence ASF-coupling
+  advisories, criteria-source advisories, and action-inventory advisories were
+  cleared from the relevant skills; organization metadata, governance
+  vocabulary, disclosure-governance flags, and source-control abstraction work
+  also landed. Spec: [`specs/project-agnosticism.md`](specs/project-agnosticism.md).
+- **Good-first-issue sweep implemented off main** — `origin/good-first-issue-sweep`
+  carries the `good-first-issue-sweep` skill and eval suite. It is tracked as
+  in-flight below until that PR lands on `main`.
 
 ---
 
@@ -100,15 +116,14 @@ Do not duplicate them.
 
 | Branch slug | PR | Description |
 |---|---|---|
-| `non-asf-profile-fixture` | open | Non-ASF adopter profile under `projects/_template/` + `non-asf-profile-smoke` eval (former work item 3) |
+| `good-first-issue-sweep` | open | `good-first-issue-sweep` skill + eval suite; keep out of the build queue until the PR lands or is explicitly abandoned. |
 
-The previous in-flight batch (spec-validator SPDX / path-existence /
-Known-gaps checks, the `spec-validate` pre-commit hook, the SOFT
-eval-coverage check, `pr-management-quick-merge`, the security-tracker
-dashboard pytest suite, the loop incremental-sync and CLI-UX changes, the
-markdownlint Node bump, the AGENTS.md slim, and the modes / mentoring /
-setup-status doc syncs) has all merged and is reflected in the code and
-in **What's been built** above.
+The previous in-flight batch (non-ASF adopter profile fixture,
+reviewer-routing, skill-reconciler, release-management completion,
+repo-health completion, high-confidence ASF-coupling cleanup, criteria-source /
+action-inventory advisory cleanup, organization metadata, governance vocabulary,
+and adapter-discovery docs) has merged to `main` and is reflected in
+**What's been built** above.
 
 ---
 
@@ -117,72 +132,337 @@ in **What's been built** above.
 Priority order. Each maps to one branch and one PR. Branch names are
 slugs, not numbers (numbering implies an order the specs don't carry).
 
-1. **First reviewer-routing skill: reviewer-routing.**
-   `specs/reviewer-routing.md` is `proposed` with zero implemented
-   skills, and review-cycle latency is one of the two priorities MISSION
-   names. Add an Agentic Triage-family skill `reviewer-routing` that takes an open
-   issue or PR and proposes a primary reviewer (and optional backup) from
-   the project's configured roster, scored on roster eligibility for the
-   touched area, git-history familiarity with the changed paths, and the
-   reviewer's current open-review load. Read-only / propose-then-confirm:
-   it never assigns or requests review. An unresolved roster yields an
-   explicit `NO ELIGIBLE REVIEWER` signal, never a fabricated handle.
-   Include an eval suite with an adversarial case asserting an injected
-   "assign to X" line in a PR body is ignored.
+1. **Sync shipped-state specs after the recent merge train.**
+   Several specs still carry pre-merge language even though the code has
+   shipped. Update `specs/reviewer-routing.md` and
+   `specs/skill-reconciler.md` so their **Where it lives** and **Known gaps**
+   sections describe the shipped skills instead of saying "proposed, not
+   implemented"; update `specs/overview.md` so reviewer routing and the
+   reconciler are listed as `experimental`; refresh
+   `specs/meta-and-quality-tooling.md`'s shipped-skill/eval count; and verify
+   `specs/project-agnosticism.md` / `specs/issue-management-family.md` no longer
+   advertise already-cleared gaps (high-confidence ASF-coupling backlog,
+   unwired governance-member terminology, missing issue-management rows in
+   `docs/modes.md`).
+   Validation:
+   ```bash
+   uv run --project tools/spec-status-index spec-status --ready
+   uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
+   ```
+   Branch `spec-shipped-state-sync`.
+
+2. **Post-merge sync for good-first-issue-sweep.**
+   Once the `good-first-issue-sweep` PR lands on `main`, remove it from the
+   in-flight table and sync every shipped-state surface: flip
+   `specs/good-first-issue-sweep.md` from `proposed` to `experimental`,
+   update `specs/overview.md`, add the skill to `docs/modes.md` and the
+   mentoring / contributor-growth family docs, and update the eval-coverage
+   counts if they are still numeric. This item is intentionally blocked until
+   the PR lands; do not duplicate the branch implementation.
+   Validation:
+   ```bash
+   test -f skills/good-first-issue-sweep/SKILL.md
+   test -d tools/skill-evals/evals/good-first-issue-sweep
+   uv run --project tools/spec-status-index spec-status --ready
+   uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
+   ```
+   Spec: [`specs/good-first-issue-sweep.md`](specs/good-first-issue-sweep.md).
+   Branch `good-first-issue-sweep-post-merge-sync`.
+
+3. **Clear the mechanical SOFT validator warnings.**
+   Handle the current non-judgement soft warnings that have obvious local
+   remedies: add the missing Privacy-LLM gate preflight to
+   `reviewer-routing`, add an explicit bounded `--limit` to the
+   `security-issue-import` `gh issue list` call, and replace the
+   `release-prepare` inline `--body "..."` usage with a `--body-file` flow.
+   Leave ASF-coupling warnings out of this item; those require human
+   judgement and are tracked separately below.
    Validation:
    ```bash
    uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
    uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/reviewer-routing/
+   uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/security-issue-import/
+   uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/release-prepare/
+   ```
+   Branch `mechanical-soft-warning-cleanup`.
+
+4. **Low-confidence ASF-coupling judgement pass.**
+   The high-confidence coupling backlog is clear, but the validator still
+   reports low-confidence `asf-coupling` warnings such as bare governance
+   terms (`PMC`) and contributor-intake terms (`ICLA`). Review each warning in
+   context and classify it as one of three outcomes: convert to a placeholder,
+   route through an existing capability flag, or explicitly keep as an
+   ASF-default example. The output should be a narrow set of skill/doc edits
+   plus a short note in `specs/project-agnosticism.md` explaining which
+   residual warnings are intentionally advisory.
+   Validation:
+   ```bash
+   uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
+   uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/committer-onboarding/
+   uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/contributor-nomination/
+   uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/release-promote/
+   ```
+   Spec: [`specs/project-agnosticism.md`](specs/project-agnosticism.md).
+   Branch `low-confidence-asf-coupling-pass`.
+
+5. **Add an adopter-pilot feedback harness.**
+   Many experimental family specs now share the same real gap: no adopter has
+   run the full skill family end-to-end. Add a lightweight pilot-report
+   template and helper (or a documented `tools/` command if that better matches
+   existing tooling) that records the skill run, target repo/profile, blocked
+   preflights, false positives, confirmation points, privacy/adapter notes, and
+   proposed spec updates. Wire the template into the relevant experimental
+   family docs so pilot evidence is captured consistently without turning it
+   into a continuous monitor.
+   Validation:
+   ```bash
+   uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
+   uv run --project tools/spec-validator --group dev pytest
    ```
-   Spec: [`specs/reviewer-routing.md`](specs/reviewer-routing.md).
-   Branch `reviewer-routing`.
+   Spec: [`specs/meta-and-quality-tooling.md`](specs/meta-and-quality-tooling.md).
+   Branch `adopter-pilot-feedback-harness`.
 
-2. **Cross-project skill reconciler: skill-reconciler.**
-   `specs/skill-reconciler.md` is `proposed` with no implementation. Add
-   a meta/infra-family skill `skill-reconciler` that compares two
-   near-duplicate skills (two `source`-tagged copies, e.g. an ASF and a
-   non-ASF variant) and emits a structured diff plus a reconciliation
-   proposal, labelling every difference `ALLOWED`, `DRIFT`, or
-   `SAFETY-BASELINE`. Read-only: it proposes, it never rewrites either
-   skill (convergence is a separate confirmed `write-skill` /
-   `optimize-skill` edit). A safety-baseline divergence is always a
-   must-fix and never folded into allowed-divergence noise. First cut may
-   take two explicit paths rather than auto-pairing by `source` tag.
-   Include an eval suite with a case where the two copies diverge only on
-   the safety baseline and the reconciler must flag it.
+6. **Expand organization-adapter smoke coverage.**
+   The non-ASF profile smoke test proves one issue-management path. Extend
+   smoke coverage across at least three organization-sensitive surfaces:
+   security intake (`security-intake-config.md` / disclosure-governance
+   flags), release backend selection (`release-management-config.md`), and
+   contributor governance (`committer-onboarding-config.md`). The goal is not
+   new product behaviour; it is executable confidence that organization
+   defaults and project overrides work outside an ASF-shaped profile.
+   Validation:
+   ```bash
+   uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
+   uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/non-asf-profile-smoke/
+   uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/security-issue-import/
+   uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/release-prepare/
+   uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/committer-onboarding/
+   ```
+   Spec: [`specs/organization-adapters.md`](specs/organization-adapters.md).
+   Branch `organization-adapter-smoke-expansion`.
+
+7. **Add a dedicated pr-management-code-review eval suite.**
+   `specs/pr-management-family.md` still calls out that
+   `pr-management-code-review` lacks a dedicated eval suite. Add
+   `tools/skill-evals/evals/pr-management-code-review/` with focused cases for
+   selector resolution, review-risk classification, AI-generated-code signal
+   handling, prompt-injection-in-PR-content handling, and the final review
+   handoff. Keep the suite read-only: it should assert the review findings and
+   handoff shape, not require live GitHub writes.
+   Validation:
+   ```bash
+   uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
+   uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/pr-management-code-review/
+   ```
+   Spec: [`specs/pr-management-family.md`](specs/pr-management-family.md).
+   Branch `pr-management-code-review-evals`.
+
+8. **Extract the skill-reconciler safety-baseline checklist.**
+   The shipped `skill-reconciler` recognizes safety-baseline divergence from
+   prose patterns. Extract the baseline clauses into one canonical checklist
+   file that both humans and tooling can reference: untrusted content is never
+   instructions, collaborator / identity-resolution caveats are preserved, and
+   confidentiality posture is not weakened. Update `skill-reconciler` to cite
+   that checklist and add eval coverage proving a divergence in any checklist
+   item is classified as `SAFETY-BASELINE`.
    Validation:
    ```bash
    uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
    uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/skill-reconciler/
    ```
    Spec: [`specs/skill-reconciler.md`](specs/skill-reconciler.md).
-   Branch `skill-reconciler`.
-
-3. **Clear the high-confidence ASF-coupling advisory backlog.**
-   The SOFT ASF-coupling lint (shipped, see **What's been built**) still
-   flags ~62 high-confidence couplings, almost all in the
-   release-management skills: hardcoded ASF dist-tree paths (`dist/dev/`,
-   `dist/release/`), `svn mv` / `svn commit` / `svn checkout`
-   distribution commands, and the literal `announce@apache.org` list. The
-   capability-flag vocabulary and `release-management-config.md`'s backend
-   flags (`release-dist-backend`, `release-announce-backend`) already
-   exist; this item wires the release skills (`release-rc-cut`,
-   `release-promote`, `release-archive-sweep`, `release-keys-sync`,
-   `release-prepare`, `release-verify-rc`, `release-vote-draft`,
-   `release-vote-tally`, `release-announce-draft`) plus
-   `security-issue-sync` to read those flags / use the `<announce-list>`
-   placeholder instead of hardcoding ASF specifics, regressing no
-   behaviour for the ASF default profile. Low-confidence advisories (bare
-   `PMC`, `ICLA`, `incubator`) are out of scope: the SOFT lint leaves
-   those to contributor self-judgement. Done when the validator reports
-   zero high-confidence asf-coupling warnings.
+   Branch `skill-reconciler-safety-baseline-checklist`.
+
+9. **Add adapter authoring smoke validation.**
+   Adapter discovery and authoring docs have landed; add a validator or smoke
+   fixture that checks each tool / adapter README declares the required authoring
+   fields: capability, prerequisites, privacy / credential handling, operations,
+   and config keys. Keep this as an advisory or narrowly scoped hard check based
+   on existing docs so legacy adapters can be brought into compliance
+   deliberately rather than through unrelated churn.
    Validation:
    ```bash
    uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
-   uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/release-promote/
+   uv run --project tools/spec-validator --group dev pytest
+   ```
+   Spec: [`specs/adapters.md`](specs/adapters.md).
+   Branch `adapter-authoring-smoke-validation`.
+
+10. **Add docs/modes.md generated consistency checks.**
+   `docs/modes.md` is a high-traffic index, and recent work has repeatedly
+   needed manual count / skill-list syncs after new skills landed. Add a
+   validator check (or a small generated-consistency helper invoked by the
+   validator) that compares the mode tables against live `skills/*/SKILL.md`
+   frontmatter: each shipped skill appears in the expected mode section, status
+   counts match the frontmatter, and no removed skill remains listed. Keep the
+   first version focused on detection; rewriting the doc can remain a separate
+   human-confirmed update.
+   Validation:
+   ```bash
+   uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
+   uv run --project tools/skill-and-tool-validator --group dev pytest
+   ```
+   Spec: [`specs/meta-and-quality-tooling.md`](specs/meta-and-quality-tooling.md).
+   Branch `modes-doc-consistency-check`.
+
+11. **Normalize tool README prerequisites consistency.**
+   Tool README prerequisites are now part of the authoring contract, but older
+   tool docs may still vary in section shape and required credential / runtime
+   detail. Sweep `tools/*/README.md` for the Prerequisites section, normalize
+   the expected headings and wording where the existing tool behaviour is
+   clear, and tighten the validator only after the tree is brought into
+   compliance. Keep adapter-specific privacy / credential checks in the
+   adapter-authoring smoke item above; this item is the general README
+   prerequisite contract.
+   Validation:
+   ```bash
+   uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
+   uv run --project tools/skill-and-tool-validator --group dev pytest
+   ```
+   Spec: [`specs/meta-and-quality-tooling.md`](specs/meta-and-quality-tooling.md).
+   Branch `tool-readme-prerequisites-consistency`.
+
+12. **Tighten skill frontmatter schema validation.**
+   Strengthen the validator's frontmatter contract for `mode`, `status`,
+   `capability`, `organization`, and `source`: modes and statuses must be from
+   the documented vocabulary; organizations must exist under `organizations/`;
+   multi-capability skills must use a YAML list consistently; and every shipped
+   experimental skill must have a matching eval suite unless it is explicitly
+   exempted with a documented reason. Keep the first pass focused on fields the
+   current tree can satisfy after local cleanup.
+   Validation:
+   ```bash
+   uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
+   uv run --project tools/skill-and-tool-validator --group dev pytest
+   ```
+   Spec: [`specs/meta-and-quality-tooling.md`](specs/meta-and-quality-tooling.md).
+   Branch `skill-frontmatter-schema-tightening`.
+
+13. **Add project-template drift checks.**
+   Add a validator or smoke tool that compares `projects/_template/` with
+   `projects/non-asf-example/` for structural drift: required config files are
+   present, documented keys exist in both profiles when applicable, template-only
+   keys are either copied or intentionally explained, and organization-inherited
+   defaults do not hide missing adopter-required values. The check should catch
+   stale template docs without forcing the example to mirror ASF-specific values.
+   Validation:
+   ```bash
+   uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
+   uv run --project tools/skill-and-tool-validator --group dev pytest
+   uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/non-asf-profile-smoke/
    ```
    Spec: [`specs/project-agnosticism.md`](specs/project-agnosticism.md).
-   Branch `asf-coupling-cleanup`.
+   Branch `project-template-drift-check`.
+
+14. **Add override-file contract tests.**
+   Document and test the `.apache-magpie-overrides/<skill>.md` contract: override
+   files are additive project guidance, agent-readable Markdown, and never a
+   replacement for the framework safety / confidentiality baseline. Add a
+   validator or smoke fixture that flags override text attempting to weaken the
+   baseline and confirms a clean override can be discovered and surfaced to a
+   skill without editing the upstream skill body.
+   Validation:
+   ```bash
+   uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
+   uv run --project tools/skill-and-tool-validator --group dev pytest
+   ```
+   Spec: [`specs/adoption-and-setup.md`](specs/adoption-and-setup.md).
+   Branch `override-file-contract-tests`.
+
+15. **Add capability taxonomy coverage checks.**
+   Validate that every `capability` declared in skill frontmatter and tool
+   READMEs is documented in `docs/labels-and-capabilities.md`, and that every
+   capability in the taxonomy maps to at least one skill/tool or is explicitly
+   marked reserved / future. The check should catch misspellings and stale
+   taxonomy rows without requiring every capability to have both a skill and a
+   tool implementation.
+   Validation:
+   ```bash
+   uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
+   uv run --project tools/skill-and-tool-validator --group dev pytest
+   ```
+   Spec: [`specs/meta-and-quality-tooling.md`](specs/meta-and-quality-tooling.md).
+   Branch `capability-taxonomy-coverage-check`.
+
+16. **Define the release audit report schema.**
+   `release-audit-report` exists, but downstream review would benefit from a
+   structured audit-record schema. Add a template/schema for the required audit
+   fields (release version, RC artefacts, vote thread, tally outcome, promotion
+   revision, announcement URL, archive state, and any follow-up notes), update
+   the skill to reference it, and add eval fixtures that reject incomplete audit
+   records while preserving the human-reviewed nature of the report.
+   Validation:
+   ```bash
+   uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
+   uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/release-audit-report/
+   ```
+   Spec: [`specs/release-management-lifecycle.md`](specs/release-management-lifecycle.md).
+   Branch `release-audit-report-schema`.
+
+17. **Add mail-adapter privacy-boundary tests.**
+   Add smoke tests or validator fixtures for Gmail, PonyMail, `mail-archive`,
+   and any `mail-source` adapter path proving private mail content is redacted,
+   summarized, or routed through the Privacy-LLM gate before it enters
+   model-facing skill context. The test should treat fetched mail as external
+   data and include at least one prompt-injection-in-email fixture to preserve
+   the repository's data-not-instructions rule.
+   Validation:
+   ```bash
+   uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
+   uv run --project tools/spec-validator --group dev pytest
+   ```
+   Spec: [`specs/adapters.md`](specs/adapters.md).
+   Branch `mail-adapter-privacy-boundary-tests`.
+
+18. **Add branch-name confidentiality validation.**
+   Add a validator check or deterministic helper that scans generated branch
+   name examples in skills/docs and rejects embargo-breaking terms: CVE IDs,
+   `security`, `vulnerability`, `advisory`, and tracker-private title fragments.
+   Align the check with the existing security-fix workflow guidance so public
+   branch names stay neutral before disclosure.
+   Validation:
+   ```bash
+   uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
+   uv run --project tools/skill-and-tool-validator --group dev pytest
+   uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/security-issue-fix/
+   ```
+   Spec: [`specs/privacy-llm-gate.md`](specs/privacy-llm-gate.md).
+   Branch `branch-name-confidentiality-validation`.
+
+19. **Add the deterministic structural-diff helper for skill-reconciler.**
+   The shipped `skill-reconciler` reasons over a prose comparison report.
+   Add the optional `tools/` helper sketched in
+   `specs/skill-reconciler.md`: parse two skill trees into a normalized
+   structural diff (frontmatter, section headings, step inventory,
+   placeholder inventory, and linked support files) so the skill can ground
+   `ALLOWED` / `DRIFT` / `SAFETY-BASELINE` decisions in a deterministic
+   object. Keep the reconciler read-only; the helper emits data only.
+   Include unit tests for frontmatter-only, section-order, placeholder, and
+   support-file divergences, plus one safety-baseline fixture that proves the
+   helper preserves the clauses the skill must classify.
+   Validation:
+   ```bash
+   uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
+   uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/skill-reconciler/
+   uv run --project tools/skill-reconciler-diff --group dev pytest
+   ```
+   Spec: [`specs/skill-reconciler.md`](specs/skill-reconciler.md).
+   Branch `skill-reconciler-structural-diff`.
+
+20. **Add source-tag auto-pairing to skill-reconciler.**
+   The first implementation takes two explicit paths. Extend the skill so
+   a maintainer can ask it to discover near-duplicate skills by `source`
+   tag / capability metadata and present a bounded candidate pair list
+   before running the comparison. Preserve explicit-path mode as the
+   default and require confirmation before comparing any discovered pair,
+   so the skill remains read-only and predictable.
+   Validation:
+   ```bash
+   uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-validate
+   uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/skill-reconciler/
+   ```
+   Spec: [`specs/skill-reconciler.md`](specs/skill-reconciler.md).
+   Branch `skill-reconciler-source-pairing`.
 
 ---
 
@@ -197,36 +477,23 @@ slugs, not numbers (numbering implies an order the specs don't carry).
   it would skip the proof MISSION requires.
 - When a build iteration creates a new skill, its eval suite is part of
   that same work item — not a separate one.
-- **Release-management family:** the first four skills (`release-vote-draft`,
-  `release-announce-draft`, `release-vote-tally`, `release-verify-rc`)
-  have shipped and are recorded in **What's been built**. The remaining
-  six (`release-prepare`, `release-keys-sync`, `release-rc-cut`,
-  `release-promote`, `release-archive-sweep`, `release-audit-report`)
-  should be planned in subsequent passes now that the first four have
-  established the skill-authoring patterns for this family.
+- **Release-management family:** all ten skills have shipped and are recorded
+  in **What's been built**. Further release-management work should come from
+  adopter-pilot evidence or newly accepted specs, not from the old "remaining
+  six skills" queue.
 - **Agentic Triage contributor-growth gaps** (PMC-member nomination,
   emeritus-committer handling, contributor offboarding) noted in
   `triage-mode.md` Known Gaps are intentionally deferred: they are
   vague enough that a spec-RFC conversation is more appropriate than
   a direct build item.
-- **Project-agnosticism:** the ASF-coupling advisory lint has shipped
-  (recorded in **What's been built**); the non-ASF adopter profile
-  fixture is in flight (PR open, see **In-flight**). The capability-flag
-  vocabulary for contributor/committer intake (ICLA vs DCO), security
-  intake, and CVE allocation has been enumerated and shipped to main
-  (recorded in **What's been built**), following the backend-flag
-  precedent set by `release-management-lifecycle.md` (distribution /
-  approval / announcement backends). The remaining follow-on is wiring
-  the skills to read those flags (work item 3) — an engineering task, not
-  a spec-authoring one. The SOFT ASF-coupling lint still reports ~62
-  high-confidence couplings (hardcoded `dist/` paths, `svn` commands,
-  `announce@apache.org`), almost all in the release-management skills,
-  which is the measurable backlog work item 3 clears.
+- **Project-agnosticism:** the ASF-coupling advisory lint, the non-ASF adopter
+  profile fixture, the capability-flag vocabulary, and the high-confidence
+  coupling cleanup have shipped. Remaining low-confidence advisories (for
+  example bare governance terms that may be legitimate ASF defaults) stay
+  human-judgement items unless a future spec turns them into a hard rule.
 - **General-issue dedupe and backlog dashboard** (`triage-mode.md` Known
   Gaps) have shipped (`issue-deduplicate`, `issue-backlog-stats`) alongside
   `issue-stale-sweep`; see **What's been built**. No longer planned items.
-- **Repo-health family** has shipped its first three members
-  (`ci-runner-audit`, `workflow-security-audit`, `dependency-audit`) under
-  its own [`specs/repo-health-family.md`](specs/repo-health-family.md);
-  remaining candidates (license / NOTICE compliance, flaky-test detection)
-  are deferred to a subsequent pass.
+- **Repo-health family** has shipped all five designed members under
+  [`specs/repo-health-family.md`](specs/repo-health-family.md). No additional
+  repo-health skill is planned until adopter-pilot runs produce a concrete gap.
diff --git a/tools/spec-loop/specs/adapters.md b/tools/spec-loop/specs/adapters.md
index 744e5e41..36e589ae 100644
--- a/tools/spec-loop/specs/adapters.md
+++ b/tools/spec-loop/specs/adapters.md
@@ -57,6 +57,15 @@ by swapping the adapter, not the skill.
   ([privacy-llm-gate.md](privacy-llm-gate.md)) before any LLM read.
 - **Write-back is confirm-before-apply** and routed through the sandbox's
   `ask` gate ([agent-isolation-sandbox.md](agent-isolation-sandbox.md)).
+- **Adapter READMEs are contracts.** Every adapter README declares the
+  capability it provides, prerequisites, credential/privacy handling,
+  supported operations, and adopter config keys. These fields let a
+  validator distinguish an intentional adapter surface from undocumented
+  shell prose.
+- **Private mail is hostile input.** Gmail, PonyMail, `mail-archive`, and
+  `mail-source` content is external data, never instructions. Tests for
+  mail adapters should include prompt-injection text in fetched mail and
+  prove it is carried as report data only after redaction/gating.
 
 ## Out of scope
 
@@ -69,6 +78,10 @@ by swapping the adapter, not the skill.
    adapter + placeholder.
 2. Mail adapters draft only and redact before LLM read.
 3. Each adapter ships with its own tests.
+4. Adapter READMEs declare capability, prerequisites,
+   privacy/credential handling, operations, and config keys.
+5. Mail-adapter tests prove private fetched content crosses the
+   Privacy-LLM/redaction boundary before model-facing skill context.
 
 ## Validation
 
@@ -86,3 +99,10 @@ done
   ASF coupling across the catalogue, and the capability-flag mechanism for
   workflow branches that no adapter resolves, live in
   [project-agnosticism.md](project-agnosticism.md).
+- **Adapter authoring smoke validation is missing.** The docs define the
+  expected README contract, but no validator currently checks that each
+  adapter declares capability, prerequisites, privacy/credential handling,
+  operations, and config keys.
+- **Mail-adapter privacy tests are thin.** The redaction contract exists,
+  but adapter-level fixtures should prove that private mail and embedded
+  prompt-injection attempts do not enter model-facing context untreated.
diff --git a/tools/spec-loop/specs/adoption-and-setup.md b/tools/spec-loop/specs/adoption-and-setup.md
index 2a27ad9e..615ee649 100644
--- a/tools/spec-loop/specs/adoption-and-setup.md
+++ b/tools/spec-loop/specs/adoption-and-setup.md
@@ -61,6 +61,12 @@ gitignored skill symlinks, and committed agent-readable override files.
 - **Overrides are agent-readable Markdown** under
   `.apache-magpie-overrides/`, consulted at runtime and merged before
   default behaviour ([pairing/correctability is the model]).
+- **Overrides are additive, never authority inversion.** An override may
+  supply adopter-specific process details, paths, labels, or wording, but
+  it must not replace or weaken the framework's safety, confidentiality,
+  privacy, or external-content-as-data baseline. If an override conflicts
+  with those baseline rules, the framework rule wins and the conflict is
+  surfaced.
 
 ## Out of scope
 
@@ -73,6 +79,9 @@ gitignored skill symlinks, and committed agent-readable override files.
 1. Adoption commits only the bootstrap skill + lock/override scaffold.
 2. The committed lock re-installs the same version on a fresh clone.
 3. Drift between local and committed locks is surfaced with an upgrade.
+4. Override files can be discovered and surfaced to skills without
+   editing upstream skill bodies, and override text cannot weaken the
+   safety/confidentiality baseline.
 
 ## Validation
 
@@ -86,3 +95,7 @@ uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-valid
 - `stable`; gaps appear as new agent targets to add to the registry
   ([`agents.md`](../../../skills/setup/agents.md)) or new override
   surfaces — recorded by the plan pass.
+- **Override-file contract tests are missing.** The docs describe
+  agentic overrides, but no smoke fixture proves that clean overrides are
+  additive or that an override attempting to relax safety/confidentiality
+  rules is flagged rather than applied.
diff --git a/tools/spec-loop/specs/meta-and-quality-tooling.md b/tools/spec-loop/specs/meta-and-quality-tooling.md
index 5d1fb604..e18a9fb7 100644
--- a/tools/spec-loop/specs/meta-and-quality-tooling.md
+++ b/tools/spec-loop/specs/meta-and-quality-tooling.md
@@ -64,6 +64,18 @@ trustworthy as it grows.
   heuristic/text tools with no model calls — reproducible in CI.
 - **Hard vs soft rules.** The validator fails on missing frontmatter or
   broken links; advisories are warnings unless `--strict`.
+- **Schema-backed metadata.** Skill frontmatter, tool README capability
+  declarations, and family/index docs are treated as machine-checkable
+  contracts. New checks should prefer clear enum/list validation over
+  prose inference when the repository already declares the vocabulary.
+- **Generated-index consistency.** Human-facing catalogue pages such as
+  `docs/modes.md` may stay hand-written, but validator checks should
+  compare their skill lists and counts against live `skills/*/SKILL.md`
+  frontmatter so documentation drift is visible before review.
+- **Pilot evidence is structured.** Experimental-family pilot reports
+  should capture the same minimal fields every time: skill/family,
+  target repo/profile, blocked preflights, false positives, confirmation
+  points, privacy/adapter notes, and proposed spec changes.
 
 ## Out of scope
 
@@ -76,6 +88,15 @@ trustworthy as it grows.
 1. `skill-and-tool-validate` enforces required frontmatter + link integrity.
 2. `list-skills` generates its index from live frontmatter.
 3. Each meta tool ships with its own tests.
+4. Frontmatter values for `mode`, `status`, `capability`,
+   `organization`, and `source` are validated against documented
+   vocabularies; unknown organizations fail unless the organization
+   exists under `organizations/`.
+5. Capabilities declared in skill frontmatter and tool READMEs are
+   present in `docs/labels-and-capabilities.md`; taxonomy entries with no
+   implementation are explicitly marked reserved or future.
+6. `docs/modes.md` skill lists and shipped counts are checked against
+   live skill frontmatter.
 
 ## Validation
 
@@ -86,10 +107,23 @@ uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-valid
 
 ## Known gaps
 
-- **Eval coverage is complete.** All 44 shipped skills have a matching
-  suite in `tools/skill-evals/evals/`; the soft eval-coverage check in
-  `skill-and-tool-validator` (check #8) warns when a newly added skill
-  has no suite, keeping coverage complete going forward.
-- Other gaps appear as new quality checks worth adding — recorded by the
-  plan pass. The spec validator (analogous to the skill validator) and
-  the ASF-coupling advisory lint are two recent additions to this surface.
+- **Eval coverage is complete.** Every shipped skill has a matching suite
+  in `tools/skill-evals/evals/`; the soft eval-coverage check in
+  `skill-and-tool-validator` warns when a newly added skill has no suite,
+  keeping coverage complete going forward.
+- **Frontmatter validation is still shallow.** Current validation covers
+  required fields, but the next pass should make `mode`, `status`,
+  `capability`, `organization`, and `source` combinations explicit and
+  test-backed.
+- **Capability taxonomy drift is not yet checked.** The validator should
+  catch misspelled or undocumented capability values, and should surface
+  taxonomy rows that no skill/tool implements unless they are marked
+  reserved.
+- **`docs/modes.md` is manually synced.** The plan tracks a generated
+  consistency check so mode tables and shipped counts cannot silently
+  drift from skill frontmatter.
+- **Tool README prerequisites vary.** A prerequisites consistency pass
+  should normalize older tool READMEs before tightening the validator.
+- **Pilot evidence has no common shape.** Experimental-family specs all
+  need adopter evidence, but there is no standard pilot-report template
+  or helper yet.
diff --git a/tools/spec-loop/specs/organization-adapters.md b/tools/spec-loop/specs/organization-adapters.md
index 4b6db859..841b8acb 100644
--- a/tools/spec-loop/specs/organization-adapters.md
+++ b/tools/spec-loop/specs/organization-adapters.md
@@ -73,6 +73,10 @@ identical "ASF default" values in its own `project.md`.
 - The `organizations/ASF/organization.md` keys mirror the namespaces of
   the project manifest's *Security workflow configuration* section so
   resolution is mechanical.
+- Organization smoke coverage should exercise more than one family. At
+  minimum, a non-ASF profile must be able to drive security intake
+  backend selection, release backend selection, and contributor-governance
+  defaults without editing skill bodies.
 
 ## Out of scope
 
@@ -94,6 +98,8 @@ identical "ASF default" values in its own `project.md`.
   first hit wins; no skill branches on the organization.
 - A new organization can be authored from `organizations/_template/` with
   no skill edits.
+- Smoke fixtures cover security intake, release backend, and contributor
+  governance defaults for at least one non-ASF profile.
 
 ## Validation
 
@@ -121,3 +127,7 @@ resolves to the baseline.
   convention).
 - The family-level `organization:` scope (replacing `asf: true/false`)
   and the external-adapter discovery index are separate follow-ups.
+- **Smoke coverage is narrow.** The non-ASF profile smoke currently proves
+  one path. The next coverage pass should exercise security intake,
+  release backend selection, and contributor governance so organization
+  defaults are tested across the surfaces most likely to drift.
diff --git a/tools/spec-loop/specs/pr-management-family.md b/tools/spec-loop/specs/pr-management-family.md
index 4a4f6d3d..a6278909 100644
--- a/tools/spec-loop/specs/pr-management-family.md
+++ b/tools/spec-loop/specs/pr-management-family.md
@@ -137,6 +137,9 @@ is listed here for navigability since its domain is PR threads.
 4. `pr-management-stats` emits read-only tables without mutating any
    tracker or PR state.
 5. All family skills pass `skill-and-tool-validate` with no errors.
+6. `pr-management-code-review` has a dedicated eval suite covering
+   selector resolution, review-risk classification, AI-generated-code
+   signals, prompt injection in PR content, and the final review handoff.
 
 ## Validation
 
@@ -161,9 +164,10 @@ uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-valid
   code-review skill ships without a matching suite in
   `tools/skill-evals/evals/pr-management-code-review/`; the SOFT
   eval-coverage check in `skill-and-tool-validator` flags this. Adding a
-  step-level fixture set (at minimum an adversarial prompt-injection case
-  and a typical APPROVE output) is the next concrete quality improvement
-  for this family.
+  step-level fixture set is the next concrete quality improvement for
+  this family. Minimum coverage: selector resolution, review-risk
+  classification, AI-generated-code signal handling, prompt injection in
+  PR body/comments, and a typical APPROVE / REQUEST_CHANGES handoff.
 - **Stale-PR handling is built into `pr-management-triage`.** Dedicated
   stale sweeps (`stale-draft`, `inactive-open`, `stale-review-ping`) run
   as Step 5 of the triage flow and can be invoked standalone via
diff --git a/tools/spec-loop/specs/privacy-llm-gate.md b/tools/spec-loop/specs/privacy-llm-gate.md
index 35a9436c..be984bf8 100644
--- a/tools/spec-loop/specs/privacy-llm-gate.md
+++ b/tools/spec-loop/specs/privacy-llm-gate.md
@@ -57,6 +57,12 @@ artefact for leakage before emission.
   and any project-declared private string; failures stop the flow.
 - **Audit log is privacy-aware** — references hashed identifiers, never
   raw PII.
+- **Public branch names are public artefacts.** Generated branch names,
+  commit-message examples, PR-body templates, changelog snippets, and
+  release-note text must avoid embargo-breaking security terms before
+  disclosure. In particular, pre-disclosure public branch names must not
+  contain CVE IDs, `security`, `vulnerability`, `advisory`, or
+  tracker-private title fragments.
 
 ## Out of scope
 
@@ -71,6 +77,8 @@ artefact for leakage before emission.
    map local (0600, gitignored).
 3. The scrub catches CVE IDs / reporter names / list addresses before any
    public write.
+4. Generated public branch-name examples are scrubbed for CVE IDs and
+   embargoed security framing before use.
 
 ## Validation
 
@@ -82,3 +90,7 @@ uv run --project tools/privacy-llm --group dev pytest
 
 - `stable`; gaps surface as new PII patterns or new public-emission
   surfaces not yet covered by the scrub — caught as drift by the plan pass.
+- **Branch-name confidentiality validation is missing.** Security-fix
+  workflows already require neutral branch names, but no deterministic
+  check scans skill/docs examples for CVE IDs or embargoed terms in
+  generated branch names.
diff --git a/tools/spec-loop/specs/project-agnosticism.md b/tools/spec-loop/specs/project-agnosticism.md
index 95b6e914..269dd2bb 100644
--- a/tools/spec-loop/specs/project-agnosticism.md
+++ b/tools/spec-loop/specs/project-agnosticism.md
@@ -108,6 +108,11 @@ The three mechanisms, in order of preference:
 - **Advisory, not paternalistic.** The audit surfaces candidate coupling
   for a maintainer to judge; some ASF strings are legitimate (examples,
   the ASF default profile, ASF-specific docs). It does not auto-rewrite.
+- **Template and example profiles stay comparable.** `projects/_template/`
+  is the adopter contract; `projects/non-asf-example/` is the proof that
+  a non-ASF adopter can satisfy that contract. Required files and config
+  keys should be structurally comparable, with omissions explained rather
+  than silently drifting.
 
 ## Out of scope
 
@@ -129,6 +134,9 @@ The three mechanisms, in order of preference:
    `<project-config>` flag, not on skill edits.
 3. The ASF profile runs the catalogue unchanged (default-valued flags),
    and a non-ASF profile can be declared without editing any skill body.
+4. The template profile and non-ASF example expose the same required
+   config surfaces, except where the example documents an intentional
+   omission or an organization-inherited default.
 
 ## Validation
 
@@ -173,3 +181,8 @@ uv run --project tools/skill-and-tool-validator --group dev skill-and-tool-valid
   catalogue (bare `PMC`, `ICLA`, `announce@apache.org`) is surfaced by the
   advisory lint (check #10 in `skill-and-tool-validator`) for human
   judgement.
+- **Template/profile drift is not mechanically checked.** The non-ASF
+  example is now a real smoke fixture, but no validator compares its file
+  and key surface against `projects/_template/`. A drift check should
+  catch missing required files, stale documented keys, and hidden
+  organization-default assumptions.
diff --git a/tools/spec-loop/specs/release-management-lifecycle.md b/tools/spec-loop/specs/release-management-lifecycle.md
index 4d6b1002..ce18efdd 100644
--- a/tools/spec-loop/specs/release-management-lifecycle.md
+++ b/tools/spec-loop/specs/release-management-lifecycle.md
@@ -119,6 +119,11 @@ code lands.
   state-changing lane, requires evidence from Release Managers and binding
   voters that the process is healthier (fewer stalled RCs, shorter
   time-to-`[ANNOUNCE]`, fewer reverted promotions).
+- **Audit records are structured.** `release-audit-report` output should
+  follow a schema/template with required fields for release version, RC
+  artefacts, vote thread, tally outcome, promotion revision, announcement
+  URL, archive state, and follow-up notes. Missing required fields are a
+  report finding, not silently omitted prose.
 
 ## Out of scope
 
@@ -139,6 +144,9 @@ code lands.
 3. No skill in the family signs, imports, promotes, sends, or merges on
    autopilot; the key-holding and publishing steps emit paste-ready
    recipes only.
+4. `release-audit-report` validates the audit record against the
+   required-field schema and flags incomplete lifecycle evidence before
+   proposing an audit-log PR.
 
 ## Validation
 
@@ -171,3 +179,7 @@ uv run --project tools/skill-evals skill-eval tools/skill-evals/evals/release-an
   cut a full release through the family yet, so the RM/binding-voter
   evidence window that would justify default-on or a state-changing lane
   has no data behind it.
+- **Release audit record schema is prose-only.** The audit-report skill
+  exists, but there is no structured schema/template that downstream
+  review can validate. The plan tracks a schema and eval fixtures for
+  incomplete records.