diff --git a/AGENTS.md b/AGENTS.md index dac2d40..4e5cb36 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -57,9 +57,12 @@ Run the loop (binary builds to `target/debug/surf`; see `CONTRIBUTING.md` for bu 3. `surf check` — if you touched code a hub anchors, it will report `DIVERGED`. Re-read the claim. If the prose **still holds**, `surf verify` re-seals it (writes the new hash); if the prose is **now false**, fix the prose first, then verify. -4. Added public behavior? Add a hub claim for it — the under-coverage warning flags public - functions with no claim. When you update a hub, update its *prose* to stay accurate, not just - the hash. +4. Added public behavior? First reach for an *existing* system claim: extend its prose, or add + the new symbol as another site under its multi-site `at:` list. Write a brand-new claim only + when the behavior is genuinely its own. A hub is an onboarding doc, not a per-function log — + the under-coverage warning lists undocumented symbols, but consolidating them into one coarse, + multi-anchor claim beats one claim per function (`surf lint` will nudge a claim-log the other + way). When you update a hub, update its *prose* to stay accurate, not just the hash. 5. Record user-facing changes in [`CHANGELOG.md`](./CHANGELOG.md) under `[Unreleased]`. 6. Hit a *notable* dogfooding moment? Log it in [`docs/dogfood-log.md`](./docs/dogfood-log.md). This is the repo eating its own dogfood, so it produces good material — capture it while it's diff --git a/CHANGELOG.md b/CHANGELOG.md index 5f70f4a..cd02928 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,7 +7,20 @@ project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] +### Added +- **`surf lint` consolidation nudges (#142).** Two advisory warnings push hubs away from the + "claim-log" shape (one claim per function) and toward onboarding docs: a *claim-log* warning + when a hub has several claims and never once uses a multi-site `at:` list, and a *thin-prose* + warning when a multi-claim hub's body is too sparse to onboard a reader. Both are non-blocking + (exit 0), mirroring the existing over-/under-anchoring nudges. + ### Changed +- **`surf suggest` reframed (#142).** Human output now reads as a list of *undocumented symbols*, + not a list of claims to write: it groups symbols by file and emits a multi-site `at:` skeleton + with prose-first scaffolding, steering authors toward coarse, system-level claims. `--format + json` is unchanged. +- **`surf new` scaffold** ships a prose-first template (`## How it works` / `## Boundary` + headings and a multi-anchor example claim) so a fresh hub is shaped like an onboarding doc. - **Hash recipe v2 (member-access names verbatim).** The canonical hash now keeps the property/field component of a member-access expression verbatim instead of alpha-renaming it, so re-pointing an anchored span at a *different* external symbol — `PointsTier.TIER_1` → diff --git a/docs/dogfood-log.md b/docs/dogfood-log.md index d25e21d..62ebaf5 100644 --- a/docs/dogfood-log.md +++ b/docs/dogfood-log.md @@ -6,6 +6,39 @@ did about it, the lesson.* Keep it honest; the failures are the interesting part --- +## 2026-06-29 — The new claim-log nudges flagged 22 of our own hubs + +**Context:** #142 argues the CLI's in-loop signals (`surf suggest`, `lint_under_coverage`) teach +agents to write *claim-logs* — one claim per function, near-1:1 symbol→claim, no prose — because +nothing rewarded consolidation. We added the symmetric counter-pressure: a *claim-log* warning +(several claims, never a multi-site `at:`) and a *thin-prose* warning (multi-claim hub, stub body). + +**What happened:** the moment they ran, `surf lint` reported **22 warnings on our own hubs** — +0 errors, exit 0. Notably *zero* of our 17 hubs had ever used a multi-site `at:` list, and +`cli-check.md` (the example the issue calls out as too thin) tripped both new warnings. The repo +that ships the tool was itself the thing the issue describes. + +**Why it's a good story:** it's the cleanest possible confirmation of the issue's thesis — the +authors of Surface, dogfooding daily, still drifted into per-symbol logging because the loop only +ever nudged toward *more* coverage, never toward *fewer, coarser* claims. The fix isn't "write +better docs"; it's adding the missing signal. The warnings are advisory (exit 0) by design, so +they nudge without blocking — but 22 of them is a loud, honest nudge. + +**Lesson / open question:** advisory-but-loud is the right register for a stylistic nudge, but +22 warnings risks being tuned out. Open question: should consolidation be a single per-hub summary +line rather than one warning per offending hub, and is the multi-site `at:` count the best single +proxy for "this author thinks in systems, not symbols"? + +**Follow-up (same day):** we then ate the dogfood — refactored the 6 flagged hubs in the same PR. +Adding real body prose to the 5 thin ones was free (bodies aren't hashed, so no re-verify), and +`cli-git` got the repo's *first* multi-site claim: one invariant ("every git query degrades to +None; the verdict never depends on it") sealed across all five helpers, which let us trim the +per-function boilerplate. Writing it surfaced the same thing the AGENTS.md entry did — consolidating +forced us to name the shared contract explicitly. New-warning count on our hubs: 6 → 0. (The 16 +`under-coverage` warnings are a separate, older itch.) + +--- + ## 2026-06-17 — Making AGENTS.md a hub caught AGENTS.md lying about itself **Context:** We documented that `AGENTS.md`/`CLAUDE.md` *can* double as a hub (any file the `hubs` diff --git a/docs/guides/authoring-hubs.md b/docs/guides/authoring-hubs.md index 205c2ff..05d8b49 100644 --- a/docs/guides/authoring-hubs.md +++ b/docs/guides/authoring-hubs.md @@ -37,6 +37,39 @@ Prose a human (or agent) reads to understand this domain. Where hubs live is configured by the `hubs` glob in `surf.toml` (default `hubs/*.md`); keep them central or co-locate them with code (`["**/_hub.md"]`). +## A hub is an onboarding doc + +The most common failure mode is writing a hub like a **claim-log**: one claim per function, each +restating what a single symbol does, with a thin heading and no real prose. That's a changelog of +symbols, not a briefing — and it makes the verify loop a rubber-stamp, because nothing connects the +claims to a *system*. + +A good hub is the opposite: **prose first**, documenting a system, with a handful of **coarse +claims** that each seal one behavior across the places it actually lives. + +| | Claim-log (avoid) | Onboarding doc (aim for) | +| --- | --- | --- | +| **Claims** | one per symbol, near 1:1 | one per *behavior*, often spanning 2–3 sites | +| **`at:`** | a single symbol each | multi-site lists for system-level invariants | +| **Body** | a thin `#` heading | the key distinction, a `## How it works`, a Boundary note | +| **Reads as** | "what each function does" | "how this system works and what must stay true" | + +Concretely, a good claim describes *a behavior of the system* and seals every span that behavior +depends on: + +```yaml +- claim: commission is the only multi-level payout — it walks the referral graph up to three + ancestors, pays REFERRAL_COMMISSION_RATES[tier][level], and skips self-edges + at: + - backend/referral-commission.service.ts > ReferralCommissionService > buildCommissionRecords + - packages/constants/ReferralCommission.ts > REFERRAL_COMMISSION_RATES # one invariant, two sites +``` + +Write the prose a reader needs to onboard — the single most important distinction, how the pieces +fit (`## sections`, tables), and a **Boundary** note on what the gate does *not* cover — then anchor +the invariants with as few claims as the behavior allows. `surf lint` nudges the other way when a +hub drifts into claim-log shape (see below). + ## Bootstrapping with `surf suggest` Authoring claims by hand is the main adoption cost. To get a head start, point `surf suggest` at @@ -47,10 +80,13 @@ starter hub: surf suggest "src/**/*.ts" # or --format json for tooling ``` -It only suggests — it never writes a file or stamps a hash. Paste the block into a hub (or -`surf new `), write a real claim sentence for each anchor you keep, delete the rest, then -`surf verify`. Treat it as a checklist of undocumented surface, not a mandate to anchor everything -(see granularity below). +It only suggests — it never writes a file or stamps a hash. The output is a **list of undocumented +symbols, not a list of claims to write**: it groups the symbols by file and emits a multi-site +`at:` skeleton so the default shape steers you toward coarse, consolidated claims. Paste it into a +hub (or `surf new `), then **group related symbols into a few system-level claims** — write +real prose, list the sites each behavior spans under one `at:`, and delete what you don't need +before `surf verify`. Treat it as a checklist of undocumented surface, not a mandate to write one +claim per symbol (see [a hub is an onboarding doc](#a-hub-is-an-onboarding-doc) and granularity below). ## The anchor grammar @@ -73,12 +109,16 @@ src/service.ts > TokenService > rotate `src/api.ts > handler@2`. Python `@overload` sets are the exception: consecutive stubs plus their implementation resolve as *one* symbol, so the bare name works and the hash covers every signature. -- **Multiple sites** — an `at:` list combines its sites into one hash, so the claim is stale if - *any* listed span changes: +- **Multiple sites (the default for a system claim)** — a real invariant usually lives in more + than one place. An `at:` list combines its sites into one hash, so the claim is stale if *any* + listed span changes. Reach for this **first**: one coarse claim sealing a behavior across the + 2–3 places it lives is the shape of a good hub — not one claim per symbol. ```yaml - at: - - src/a.rs > foo - - src/b.rs > bar + - claim: a refresh token is accepted at most once — rotation issues a new one and the old is + rejected everywhere it's checked + at: + - src/auth/refresh.ts > rotateRefreshToken + - src/auth/refresh.ts > validateRefresh ``` Run `surf lint` to confirm every anchor resolves to exactly one symbol. Ambiguous or vanished @@ -100,6 +140,11 @@ This is the central tension (proposal §8): - **Too many anchors in one hub** — split the hub; a long verify list invites rubber-stamping. - **Uncovered public function** — a public function in a file the hub already anchors has no claim. Either add one, or accept it as intentionally undocumented. +- **Claim-log shape** — a hub with several claims that *never* use a multi-site `at:` reads as one + claim per symbol. Consolidate related claims into fewer coarse ones (see + [a hub is an onboarding doc](#a-hub-is-an-onboarding-doc)). +- **Thin prose** — a multi-claim hub whose body is a stub. A hub is an onboarding doc; add prose + that frames the system, not just claims that anchor its symbols. Rule of thumb: anchor the **smallest symbol whose logic the sentence is actually about.** diff --git a/hubs/cli-check.md b/hubs/cli-check.md index 66370cc..fd1fa7d 100644 --- a/hubs/cli-check.md +++ b/hubs/cli-check.md @@ -32,7 +32,15 @@ refs: [] # surf check -`check_claim` is the verdict; the git helpers in [`cli-git.md`](./cli-git.md) only feed the -advisory `old_code`/`magnitude` in the `--format json` report. Any divergence makes `run` exit -non-zero (the CI-blocking signal). `Scope` narrows which claims `check_workspace` evaluates when -`--base`/`--files` are given. +`check` is the gate — the one command CI runs. **The distinction to hold onto:** the verdict is +*purely a function of anchored code and stored hashes*. It reads no git, so the same tree always +produces the same answer; the git helpers in [`cli-git.md`](./cli-git.md) only feed the advisory +`old_code`/`magnitude` in the `--format json` report and never change pass/fail. + +`check_claim` is the per-claim verdict; `check_workspace` walks every hub, and `Scope` narrows +which claims it evaluates when `--base` or `--files` is given — opt-in and intersective, falling +back to a full check rather than checking nothing. Any divergence (including a hub whose +frontmatter won't parse — the gate fails closed) makes `run` exit non-zero. + +**Boundary:** green means "nothing anchored changed since last sign-off," not "the prose is true"; +that confirmation is [`surf verify`](./cli-verify.md)'s job, not the gate's. diff --git a/hubs/cli-git.md b/hubs/cli-git.md index 43c4fb3..b1e7b30 100644 --- a/hubs/cli-git.md +++ b/hubs/cli-git.md @@ -2,46 +2,52 @@ summary: Best-effort git queries for scoping and rename-following — advisory only, the gate never depends on them. anchors: - claim: > - changed_files returns the workspace-root-relative paths changed between the merge base of - base..HEAD and the working tree (via git diff --relative), used to diff-scope the check — - so the set intersects workspace-relative anchors even when the workspace is a repo - subdirectory. A missing merge base (shallow clone) falls back to diffing the ref directly; - if git can't answer at all it returns None. + Every query here is best-effort and advisory: each returns None/empty when git can't answer + (no repo, bad ref, shallow clone), so surf degrades to a full, git-free check rather than + failing. The deterministic verdict never depends on any of them. + at: + - surf-cli/src/git.rs > changed_files + - surf-cli/src/git.rs > show + - surf-cli/src/git.rs > renamed_to + - surf-cli/src/git.rs > log_stream + - surf-cli/src/git.rs > list_files_at + hash: 2:874f501ad8f1 + - claim: > + changed_files returns workspace-root-relative paths changed between the merge base of + base..HEAD and the working tree (git diff --relative), so the set intersects + workspace-relative anchors even when the workspace is a repo subdirectory; a missing merge + base (shallow clone) falls back to diffing the ref directly. at: surf-cli/src/git.rs > changed_files hash: 2:e395bff5410d - - claim: > - show returns the contents of a file at a git ref (git show :), used to recover - the previous source for advisory old_code/magnitude. None when unavailable — the verdict is - unchanged either way. - at: surf-cli/src/git.rs > show - hash: 2:ea9143b47615 - - claim: > - renamed_to asks git's rename detection (diff --name-status --find-renames HEAD) for the new - path a file moved to, letting lint warn and verify --follow re-point instead of hard-blocking. - Best-effort: a pure mv with no content match may show as delete+add and not be detected, and - None means git couldn't pair the rename — the deterministic verdict never depends on it. - at: surf-cli/src/git.rs > renamed_to - hash: 2:a51ff4adba72 - claim: > log_stream returns the whole history window in one git spawn: every reachable commit (newest first, children before parents) with its parents and its first-parent name-status diff. Merges are included with --diff-merges=first-parent so surf stats can propagate hub state - through them, and --no-renames keeps a rename reading as delete+add. None when git can't - answer. + through them, and --no-renames keeps a rename reading as delete+add. at: surf-cli/src/git.rs > log_stream hash: 2:c5d2fccc872e - claim: > - list_files_at lists every tracked file at a commit (ls-tree -r --name-only), used to find the - hub set as it existed at a past commit. None when git can't answer. - at: surf-cli/src/git.rs > list_files_at - hash: 2:23c36e64fc4d + renamed_to asks git's rename detection (diff --name-status --find-renames HEAD) for the new + path a file moved to, letting lint warn and verify --follow re-point instead of hard-blocking. + Best-effort: a pure mv with no content match may show as delete+add and go undetected. + at: surf-cli/src/git.rs > renamed_to + hash: 2:a51ff4adba72 refs: [] --- # git helpers -A thin, best-effort wrapper over `git` via `std::process::Command` — no `git2` dependency. Every -function degrades to `None`/empty when git can't answer (no repo, bad ref, shallow clone), so the -gate stays deterministic and git-free: these only *enrich* `check` and let `lint`/`verify` -recognize a moved file ([`rename.md`](./rename.md) covers symbol renames; `renamed_to` covers the -file-rename case). +A thin wrapper over `git` via `std::process::Command` — no `git2` dependency. + +**The one distinction that matters:** these only *enrich* the gate; they never decide it. `check`'s +verdict is computed from anchored code alone, so a missing or broken git environment degrades the +gate gracefully (a full, git-free check) instead of failing closed on infrastructure. + +The five helpers split by job: `changed_files` diff-scopes `surf check --base`; `log_stream` and +`list_files_at` feed `surf stats` history; `show` recovers prior source for the advisory +`old_code`/`magnitude` enrichment in the JSON report; `renamed_to` powers file-rename recognition +in `lint`/`verify` (symbol renames are [`rename.md`](./rename.md)). The first claim seals the +contract they all share; the rest pin down the non-trivial mechanics. + +**Boundary:** nothing here is part of the deterministic verdict, and none of these mutate the repo — +they only read git state. diff --git a/hubs/cli-workspace.md b/hubs/cli-workspace.md index bf8f2ed..65a982d 100644 --- a/hubs/cli-workspace.md +++ b/hubs/cli-workspace.md @@ -16,5 +16,15 @@ refs: [] # Workspace -`discover` is what makes `surf` runnable from any subdirectory; the resolved root is the base -every anchor path is joined against. +This is the I/O layer that sits over the pure config parser ([`config.md`](./config.md)): it finds +the project and turns the hub globs into concrete files, so every other command works in terms of a +resolved root rather than the caller's current directory. + +`discover` is what makes `surf` runnable from any subdirectory — it walks up to the nearest +`surf.toml` (the same root-finding git and ruff use) and errors if none is found, so a stray +invocation outside a project fails loudly instead of silently governing nothing. The resolved root +is the base every anchor path is joined against, and `hub_paths` globs the configured patterns +relative to it (sorted and deduped) to enumerate the hubs. + +**Boundary:** discovery and enumeration only — it parses no hub bodies and resolves no anchors; +that is [`lint`](./cli-lint.md)/[`check`](./cli-check.md)'s job over the files this hands back. diff --git a/hubs/hash.md b/hubs/hash.md index 8466785..e317282 100644 --- a/hubs/hash.md +++ b/hubs/hash.md @@ -27,5 +27,17 @@ refs: [] # Canonical hashing -The fingerprint is computed over `emit`'s token stream, hashed with SHA-256 (12 hex). This is -the signal the gate compares; `Magnitude` alongside it is advisory only and never gates. +**The whole design in one line:** quiet on cosmetics, loud on logic. The fingerprint is computed +over `emit`'s canonical token stream, hashed with SHA-256 (12 hex). This is the only signal the +gate compares; `Magnitude` alongside it is advisory and never gates. + +"Canonical" is what makes the gate trustworthy: comments are dropped and identifiers are +alpha-renamed to positional placeholders, so a consistent rename or a reflow doesn't trip a claim, +while operators, keywords, and literal values stay verbatim, so a real logic edit does. The +exceptions exist because a name *is* the logic there — a Python decorator, and (v2) a +member-access name — so swapping one is caught even when it occurs once. A claim's hash is the +order-sensitive combination of its per-site hashes, which is what lets one multi-site claim go +stale when any of its spans changes. + +**Boundary:** hashing decides *that* something changed, never *whether the prose is still true* — +that judgment is the human's at [`surf verify`](./cli-verify.md). diff --git a/hubs/hub-format.md b/hubs/hub-format.md index 53a2f84..719f5d1 100644 --- a/hubs/hub-format.md +++ b/hubs/hub-format.md @@ -19,5 +19,16 @@ covers: # Hub format -`parse_hub` is the contract everything binds to. Writes go through the line-level editor -(`set_anchor_hash` / `set_anchor_at`) rather than re-serializing, to keep diffs reviewable. +A hub is the unit every command reads and writes: a `---`-fenced YAML frontmatter block (the +machine-checkable `anchors`) followed by a markdown body (the prose a human or agent reads). +`parse_hub` is the contract everything else binds to — its shape is why `at:` can be a scalar or a +list, why `hash` is optional until verified, and why unknown fields are rejected (so a typo can't +masquerade as a new field) while the forward-declared `refs`/`covers` are accepted but inert. + +**The distinction that drives the design:** a human reviews every write, so edits must be +*surgical*. Writes go through the line-level editor (`set_anchor_hash` / `set_anchor_at`) rather +than re-serializing the frontmatter — re-serializing would reorder keys and reflow scalars, burying +the one changed line in a noisy diff. An unchanged hash rewrite is therefore byte-identical. + +**Boundary:** this module is pure parsing and text editing — it resolves no anchors and computes no +hashes; it only produces the structure [`lint`](./cli-lint.md)/[`check`](./cli-check.md) act on. diff --git a/surf-cli/src/lint.rs b/surf-cli/src/lint.rs index 094da8e..007402a 100644 --- a/surf-cli/src/lint.rs +++ b/surf-cli/src/lint.rs @@ -2,7 +2,8 @@ //! vanished anchors block; a symbol that was merely renamed (detected via stored-hash //! match, §6.4) only warns and points at `surf verify --follow`. It also emits advisory //! granularity warnings (§8): anchors that span (nearly) a whole file, hubs with too many -//! anchors, and exported symbols in an anchored file that no claim covers. +//! anchors, exported symbols in an anchored file that no claim covers, and the symmetric +//! consolidation nudges (#142) — a per-symbol "claim-log" and a thin-prose body. use crate::format::Format; use crate::workspace::Workspace; @@ -21,6 +22,12 @@ const COARSE_SPAN_FRACTION_PCT: usize = 75; const COARSE_MIN_FILE_LINES: usize = 15; /// Past this many anchors a hub invites rubber-stamping during a bulk `verify` (§8). const MAX_ANCHORS_PER_HUB: usize = 12; +/// At or above this many claims, a hub that never once uses a multi-site `at:` list reads as a +/// per-symbol "claim-log" rather than a system briefing — nudge toward consolidation (#142). +const CLAIM_LOG_MIN_CLAIMS: usize = 4; +/// An onboarding hub should average at least this many words of *body* prose per claim. Below it, +/// the prose lives in the frontmatter and the body is a stub — flag thin-prose (#142). +const MIN_PROSE_WORDS_PER_CLAIM: usize = 15; #[derive(Debug, PartialEq, Eq, Serialize)] #[serde(rename_all = "snake_case")] @@ -146,6 +153,8 @@ fn lint_workspace(ws: &Workspace) -> Result> { } lint_covers(&rel, &hub, &mut findings); + lint_claim_log(&rel, &hub, &mut findings); + lint_thin_prose(&rel, &hub, &mut findings); if hub.frontmatter.anchors.len() > MAX_ANCHORS_PER_HUB { findings.push(Finding { @@ -428,6 +437,69 @@ fn lint_coarse_span( } } +/// §8/#142: the counter-pressure to under-coverage. A hub is an onboarding doc — prose-first, +/// with coarse claims that each seal one behavior across the several places it lives (a multi-site +/// `at:` list). When a hub accumulates many claims and *never once* consolidates with a multi-site +/// `at:`, it reads as a per-symbol "claim-log"; nudge toward fewer, multi-anchor claims. Advisory. +fn lint_claim_log(rel: &str, hub: &surf_core::Hub, findings: &mut Vec) { + let claims = &hub.frontmatter.anchors; + if claims.len() < CLAIM_LOG_MIN_CLAIMS { + return; + } + let multi_site = claims.iter().filter(|c| c.at.sites().len() > 1).count(); + if multi_site == 0 { + findings.push(Finding { + severity: Severity::Warn, + hub: rel.to_string(), + claim: String::new(), + at: String::new(), + message: format!( + "{} claims, all single-site — this reads as a per-symbol claim-log. A hub documents a system: consolidate related claims into fewer coarse ones, each listing every site it spans under one multi-site `at:`", + claims.len() + ), + }); + } +} + +/// §8/#142: a hub is an onboarding doc, not a frontmatter dump. Flag a multi-claim hub whose body +/// prose is too thin to onboard a reader — when the prose lives in the `claim:` fields and the +/// readable body is a stub. Advisory; single-claim hubs (short module notes) are exempt. +fn lint_thin_prose(rel: &str, hub: &surf_core::Hub, findings: &mut Vec) { + let claims = hub.frontmatter.anchors.len(); + if claims < 2 { + return; + } + let words = prose_words(&hub.body); + if words < MIN_PROSE_WORDS_PER_CLAIM * claims { + findings.push(Finding { + severity: Severity::Warn, + hub: rel.to_string(), + claim: String::new(), + at: String::new(), + message: format!( + "thin prose: {words} words of body for {claims} claims — a hub is an onboarding doc, not a list of claims. Add prose framing the system (the key distinction, how the pieces fit, what it does *not* cover)" + ), + }); + } +} + +/// Words of readable body prose, excluding fenced code blocks (``` … ```), which carry no +/// onboarding prose and would otherwise inflate the count. +fn prose_words(body: &str) -> usize { + let mut count = 0; + let mut in_fence = false; + for line in body.lines() { + if line.trim_start().starts_with("```") { + in_fence = !in_fence; + continue; + } + if !in_fence { + count += line.split_whitespace().count(); + } + } + count +} + fn lint_under_coverage( ws: &Workspace, hub: &str, @@ -699,6 +771,90 @@ mod tests { ); } + #[test] + fn claim_log_warns_on_many_single_site_claims() { + // Four claims, each anchoring a single symbol, no multi-site `at:` — the per-symbol + // claim-log smell. A rich body keeps thin-prose quiet so only the granularity warning fires. + let mut src = String::new(); + let mut anchors = String::new(); + for i in 0..CLAIM_LOG_MIN_CLAIMS { + src.push_str(&format!("pub fn f{i}() {{}}\n")); + anchors.push_str(&format!(" - claim: c{i}\n at: src/m.rs > f{i}\n")); + } + let body: String = "prose ".repeat(200); + let hub = format!("---\nsummary: x\nanchors:\n{anchors}---\n# H\n\n{body}\n"); + let (_t, ws) = ws_with(&[("src/m.rs", src.as_str()), ("hubs/a.md", hub.as_str())]); + + let f = lint_workspace(&ws).unwrap(); + let warn = f + .iter() + .find(|x| x.message.contains("claim-log")) + .expect("expected a claim-log warning"); + assert_eq!(warn.severity, Severity::Warn); + } + + #[test] + fn claim_log_silent_when_a_claim_consolidates() { + // Same claim count, but one claim uses a multi-site `at:` — the hub consolidates, so the + // claim-log nudge stays quiet. + let mut src = String::new(); + for i in 0..CLAIM_LOG_MIN_CLAIMS { + src.push_str(&format!("pub fn f{i}() {{}}\n")); + } + let mut anchors = String::from( + " - claim: pair\n at:\n - src/m.rs > f0\n - src/m.rs > f1\n", + ); + for i in 2..CLAIM_LOG_MIN_CLAIMS { + anchors.push_str(&format!(" - claim: c{i}\n at: src/m.rs > f{i}\n")); + } + let body: String = "prose ".repeat(200); + let hub = format!("---\nsummary: x\nanchors:\n{anchors}---\n# H\n\n{body}\n"); + let (_t, ws) = ws_with(&[("src/m.rs", src.as_str()), ("hubs/a.md", hub.as_str())]); + + let f = lint_workspace(&ws).unwrap(); + assert!( + !f.iter().any(|x| x.message.contains("claim-log")), + "consolidated hub should not warn: {f:?}" + ); + } + + #[test] + fn thin_prose_warns_on_stub_body() { + // Two claims, near-empty body — an onboarding doc with no onboarding prose. + let (_t, ws) = ws_with(&[ + ("src/m.rs", "pub fn a() {}\npub fn b() {}\n"), + ( + "hubs/a.md", + "---\nsummary: x\nanchors:\n - claim: a does a\n at: src/m.rs > a\n - claim: b does b\n at: src/m.rs > b\n---\n# H\n", + ), + ]); + let f = lint_workspace(&ws).unwrap(); + let warn = f + .iter() + .find(|x| x.message.contains("thin prose")) + .expect("expected a thin-prose warning"); + assert_eq!(warn.severity, Severity::Warn); + } + + #[test] + fn thin_prose_silent_with_real_body() { + // Two claims plus a real body — the onboarding prose a hub should carry; no warning. A + // fenced code block doesn't count toward prose, so the words are genuine. + let body: String = "word ".repeat(40); + let hub = format!( + "---\nsummary: x\nanchors:\n - claim: a\n at: src/m.rs > a\n - claim: b\n at: src/m.rs > b\n---\n# H\n\n{body}\n" + ); + let (_t, ws) = ws_with(&[ + ("src/m.rs", "pub fn a() {}\npub fn b() {}\n"), + ("hubs/a.md", hub.as_str()), + ]); + let f = lint_workspace(&ws).unwrap(); + assert!( + !f.iter().any(|x| x.message.contains("thin prose")), + "a hub with a real body should not warn: {f:?}" + ); + } + #[test] fn covers_valid_globs_are_silent() { let (_t, ws) = ws_with(&[ diff --git a/surf-cli/src/new.rs b/surf-cli/src/new.rs index 5e9970e..1dbfb12 100644 --- a/surf-cli/src/new.rs +++ b/surf-cli/src/new.rs @@ -42,10 +42,25 @@ fn template(name: &str) -> String { s.push_str("refs: []\n"); s.push_str("---\n\n"); s.push_str(&format!("# {name}\n\n")); - s.push_str("TODO: prose. Add anchors in the frontmatter above, e.g.\n\n"); + s.push_str( + "TODO: prose first. A hub is an onboarding doc — frame the system, not each symbol.\n", + ); + s.push_str("Lead with the single most important distinction a reader needs.\n\n"); + s.push_str("## How it works\n\n"); + s.push_str("TODO: how the pieces fit together.\n\n"); + s.push_str("## Boundary\n\n"); + s.push_str("TODO: what this system does *not* cover.\n\n"); + s.push_str( + "Then add coarse claims in the frontmatter above. Prefer one claim per *behavior*,\n", + ); + s.push_str( + "listing every site it spans under one multi-site `at:` — not one claim per symbol:\n\n", + ); s.push_str(" anchors:\n"); - s.push_str(" - claim: the invariant, in prose\n"); - s.push_str(" at: path/to/file.rs > symbolName\n\n"); + s.push_str(" - claim: the invariant this behavior guarantees, in prose\n"); + s.push_str(" at:\n"); + s.push_str(" - path/to/file.rs > symbolName\n"); + s.push_str(" - path/to/other.rs > relatedSymbol\n\n"); s.push_str("then run `surf verify` to stamp the hash.\n"); s } diff --git a/surf-cli/src/suggest.rs b/surf-cli/src/suggest.rs index 68dca5c..7912687 100644 --- a/surf-cli/src/suggest.rs +++ b/surf-cli/src/suggest.rs @@ -168,25 +168,60 @@ fn print_human(suggestions: &[Suggestion]) { println!("surf suggest: no unanchored public symbols found."); return; } + // These are *undocumented symbols*, not a list of claims to write. The output below groups + // them by file and emits one multi-site `at:` skeleton per file, so the default shape models + // a good hub — coarse, system-level claims with multi-anchor `at:` lists — rather than the + // one-claim-per-function "claim-log" that a flat 1:1 list teaches (#142). + let files = group_by_file(suggestions); println!( - "# {} unanchored public symbol(s). Paste into a hub (or `surf new `), write the", - suggestions.len() + "# {} undocumented public symbol(s) across {} file(s) — these are SYMBOLS, not claims.", + suggestions.len(), + files.len() ); println!( - "# claims, then `surf verify`. These are suggestions — nothing was written or stamped." + "# A hub is an onboarding doc: prose first, with a few COARSE claims. Do NOT write one" ); + println!("# claim per symbol. Group related symbols into a system-level claim and list every"); + println!( + "# site it spans under one `at:` (a claim is stale if any listed span changes). Replace" + ); + println!( + "# the skeleton below with real prose, delete what you don't need, then `surf verify`." + ); + println!("# Nothing was written or stamped."); println!("---"); - println!("summary: TODO one-line summary of this domain."); + println!( + "summary: TODO one line — what this system does and the single most important distinction." + ); println!("anchors:"); - for s in suggestions { - println!( - " - claim: TODO the invariant {} guarantees, in prose", - s.symbol - ); - println!(" at: {}", s.at); + for (file, ats) in &files { + println!(" # {file}: group these into as few claims as the behavior allows."); + println!(" - claim: TODO one invariant this group guarantees, in prose"); + println!(" at:"); + for at in ats { + println!(" - {at}"); + } } println!("refs: []"); println!("---"); + println!(); + println!("# TODO title"); + println!(); + println!("TODO prose a human or agent reads to onboard: how the pieces fit, the key"); + println!("distinction, and a Boundary note on what this system does *not* cover."); +} + +/// Group suggestions by their source file, preserving the already-sorted order of both files and +/// the anchors within each. Returns `(file, [at, ...])` pairs. +fn group_by_file(suggestions: &[Suggestion]) -> Vec<(String, Vec)> { + let mut files: Vec<(String, Vec)> = Vec::new(); + for s in suggestions { + match files.last_mut() { + Some((f, ats)) if *f == s.file => ats.push(s.at.clone()), + _ => files.push((s.file.clone(), vec![s.at.clone()])), + } + } + files } #[cfg(test)] @@ -229,6 +264,34 @@ mod tests { assert_eq!(s[0].symbol, "b"); } + #[test] + fn group_by_file_collapses_adjacent_symbols() { + // Two symbols in one file collapse to a single multi-site group; a third file is its own + // group — the shape that steers authors toward coarse, multi-anchor claims (#142). + let s = vec![ + Suggestion { + file: "a.rs".into(), + symbol: "x".into(), + at: "a.rs > x".into(), + }, + Suggestion { + file: "a.rs".into(), + symbol: "y".into(), + at: "a.rs > y".into(), + }, + Suggestion { + file: "b.rs".into(), + symbol: "z".into(), + at: "b.rs > z".into(), + }, + ]; + let g = group_by_file(&s); + assert_eq!(g.len(), 2); + assert_eq!(g[0].0, "a.rs"); + assert_eq!(g[0].1, vec!["a.rs > x", "a.rs > y"]); + assert_eq!(g[1].1, vec!["b.rs > z"]); + } + #[test] fn unsupported_files_are_skipped() { let (_t, ws) = ws_with(&[("notes.txt", "pub fn a() {}\n")]);