Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ anchors:
surf lint blocks when AGENTS.md carries a surf:hubs block that does not link the configured
hubs directory, or when that directory does not exist; without the block it stays silent.
at: surf-cli/src/lint.rs > lint_agents_pointer
hash: 2:9a5f7d9fd0db
hash: 2:ac139b65f5f0
refs: []
---

Expand Down
22 changes: 15 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,21 @@ project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
json` is unchanged.
- **`surf new` scaffold** ships a prose-first template (`## How it works` / `## Boundary`
headings and a multi-anchor example claim) so a fresh hub is shaped like an onboarding doc.
- **Hash recipe v2 (member-access names verbatim).** The canonical hash now keeps the
property/field component of a member-access expression verbatim instead of alpha-renaming it,
so re-pointing an anchored span at a *different* external symbol — `PointsTier.TIER_1` →
`TIER_2`, `b.Del` → `b.Keep`, `ProbeColor.RED` → `GREEN` — changes the hash even when the name
occurs once. Previously these passed the gate silently while the claim's prose became false
(#140, the member-access slice of #77). Consistent local/parameter renames stay quiet, as
before. Covers TypeScript, Go, Rust, and Python.
- **Hash recipe v2 — the bound/free split (#77).** The canonical hash now alpha-renames only
*bound* identifiers (a symbol's own name, parameters, locals, loop/range/comprehension
variables, `with`/`catch` aliases, generic params, destructuring binders) and emits every
*free* identifier verbatim (external members, call targets, types, enum/constant references,
object keys, decorators). Re-pointing an anchored span at a *different* symbol is now loud even
when the name occurs once — `PointsTier.TIER_1` → `TIER_2`, `getHighest` → `getLowest`, a bare
`helper(x)` → `other(x)`, a parameter type `Foo` → `Bar`, an object key `{ alpha }` →
`{ beta }` — where before it passed the gate silently while the claim's prose became false.
Consistent local/parameter renames stay quiet, as before. This subsumes the earlier
member-access-only first cut and the Python decorator special case (#8). A new in-tree
differential harness (`surf-core/tests/differential_hash.rs`) gates the change: zero
benign-rename regressions, 100% catch on the semantic free-swap corpus across all four
languages. Binding detection is tree-sitter-only and fail-closed; the one accepted
approximation (match-arm pattern identifiers are treated as free) is documented in
[Hash recipes](docs/reference/hash-recipes.md). Covers TypeScript, Go, Rust, and Python.
- **Versioned stamps.** Stored hashes now carry their recipe: a v2 stamp is prefixed `2:`, a bare
12-hex stamp is an implicit v1. `surf check` verifies each stamp under its own recipe, so
existing v1 stamps keep passing (with a one-line nudge) until `surf verify` re-stamps them as
Expand Down
106 changes: 85 additions & 21 deletions docs/reference/hash-recipes.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,17 +57,51 @@ Walk the resolved span's syntax tree into tokens:

SHA-256 of the token stream, truncated to 12 hex.

**Known blind spot (#77):** because *every* identifier is alpha-renamed, re-pointing a span at a
different single-occurrence external symbol (`PointsTier.TIER_1` → `TIER_2`, `b.Del` → `b.Keep`)
yields a byte-identical stream — the claim's prose silently becomes false while the gate stays
green.

### v2 — member-access names verbatim (surf ≥ 0.7.0; `2:` prefix)
**Known blind spot (#77, closed by v2):** because *every* identifier is alpha-renamed, re-pointing a
span at a different single-occurrence external symbol (`PointsTier.TIER_1` → `TIER_2`, `b.Del` →
`b.Keep`) yields a byte-identical stream — the claim's prose silently becomes false while the gate
stays green. This is exactly what the v2 bound/free split fixes.

### v2 — the bound/free split (surf ≥ 0.7.0; `2:` prefix)

v1 alpha-renames *every* identifier, which is its blind spot: an identifier occurring once maps to
the same placeholder no matter what it names, so re-pointing a span at a *different* single-occurrence
external symbol is byte-identical and silently passes.

v2 fixes this by splitting identifiers into **bound** and **free**:

- **Bound** — names *declared inside the hashed span*: the symbol's own name, parameters, locals,
loop/range/comprehension variables, `with`/`catch` aliases, generic parameters, and destructuring
binders. These are **alpha-renamed** exactly as in v1, so a consistent local rename still hashes
identically — rename tolerance (§6.1) is preserved.
- **Free** — everything else: external members, call targets, types, enum/constant references,
object/destructuring keys, decorator names, JSX tags. These are emitted **verbatim** (`kind:text`),
so re-pointing at a different symbol is loud *even when the name occurs once*.

This closes the #77 class in general, not just for member accesses: `PointsTier.TIER_1` → `TIER_2`,
`getHighest` → `getLowest`, a bare `helper(x)` → `other(x)`, a parameter type `Foo` → `Bar`, and an
object key `{ alpha }` → `{ beta }` all now change the hash. It also **subsumes** the two special
cases the older design carried — a decorator name (#8) and a member-access name (the #140 first cut)
are simply free identifiers now; no dedicated branch is needed for either. (The member-access
positions keep one dedicated check so they stay verbatim even when their text collides with a bound
local — `x` the parameter vs `obj.x` the field — since that position can never *be* the binding.)

Binding detection is tree-sitter-only — there is no scope analysis — so it is **fail-closed**:
a position not positively recognized as a binding defaults to *free* (verbatim). The two error
directions are not symmetric: misclassifying bound→free is a *visible* false positive (a benign
rename trips the gate, a human sees it); free→bound is the *invisible* miss this whole recipe exists
to prevent. So when in doubt, free wins.

**Binding positions, per family** (the tables `surf-core/src/hash.rs` `bind_here` encodes):

| Family | Bound positions |
|---|---|
| Rust | `function_item`/`function_signature_item` name; `parameter`/`let_declaration`/`for_expression`/`let_condition` patterns; `closure_parameters`; `type_parameters` |
| TypeScript | function/method/signature names; `required_parameter`/`optional_parameter` patterns; `variable_declarator` name; `arrow_function` single param; `for_in_statement` left; `catch_clause` parameter; `type_parameters` |
| Python | `function_definition` name; `parameters`/`lambda_parameters` (default *values* excluded); `assignment`/`augmented_assignment`/`for_statement`/`for_in_clause` left; `with`/`as` targets |
| Go | function/method/`var`/`const`/type-parameter names; `parameter_declaration` names (incl. grouped `a, b int`); `short_var_declaration`/`range_clause` left |

v1, plus one rule: the **property/field component of a member-access expression** is kept verbatim
(`kind:text`) instead of alpha-renamed. These positions name an *external* member, never a local
binding, so emitting them verbatim distinguishes "re-pointed at a different symbol" (loud) from
"renamed my own local" (still quiet — rename tolerance is preserved). Per family:
**Member-access positions kept verbatim even on a bound-name collision:**

| Family | Member-access position |
|---|---|
Expand All @@ -76,19 +110,49 @@ binding, so emitting them verbatim distinguishes "re-pointed at a different symb
| Rust | `field_identifier` as the `field` of a `field_expression` |
| Python | the `attribute` identifier of an `attribute` node |

Everything else is identical to v1, so v1 ≡ v2 minus this single rule — a member-access-free span
hashes the same under both. This closes the #77 blind spot for member accesses (every reported
reproduction). Re-pointing at a non-member free identifier — a bare `Enum::VARIANT` path, a renamed
imported function called by bare name — is **not** yet covered; that is the full bound/free split
tracked in [#77](https://github.com/Connorrmcd6/surface/issues/77).
**Accepted approximation (the residue).** Without scope analysis, a match-arm / pattern identifier
is indistinguishable from a unit-variant *reference* (`Some(x)` binds `x`; `None` references a
variant — same syntax). v2 leaves all such pattern identifiers **free**. Fail-closed cuts both
ways: a unit-variant swap in a match arm is *caught* (the safe direction), but renaming a match-arm
catch-all *binding* is also loud — an accepted false positive, not a bug. This is the one benign
edit class v2 does not keep silent; a future scope-aware pass could reclaim it. The limit is pinned
in `surf-core/tests/differential_hash.rs`.

## Version table

surf keeps an explicit table of every recipe ever shipped, so any stamp's recipe is always
identifiable and every dropped recipe errors with a remedy rather than a generic mismatch.

| Recipe | Stamp form | Shipped | Status | Remedy if rejected |
|---|---|---|---|---|
| v1 | bare 12-hex | surf ≤ 0.6.x | **supported** (N-1) until 0.8.0 | run `surf verify` to upgrade to v2 |
| v2 | `2:` + 12-hex | surf ≥ 0.7.0 | **current** | — |
| `N:` for unknown N | `N:` + hex | a newer surf | rejected (fails closed) | upgrade surf to a build that knows recipe N |

- **Identification never expires.** The prefix is plain data; any future surf can name the recipe of
any stamp even after the recipe's verification code is deleted. A bare hex stamp is, and always
will be, v1.
- **N-1 support, at most one legacy mode.** surf verifies the current recipe and exactly one back.
v1 compatibility ships in 0.7.0 and is **removed in 0.8.0**; after that a bare-hex stamp is a hard,
named error ("stamped by surf < 0.7 — re-stamp with `surf verify`, or check with surf 0.7.x
first"), never a silent DIVERGED. A legacy recipe is retained *only* while it is expressible as a
mode of the current code (v1 ≡ v2 with "every identifier bound" — one flag, no frozen copy). If a
future recipe cannot express its predecessor that cheaply, that is the signal to drop compat and
require stepping through an intermediate release.

## Policy (for maintainers)

- **Any** change to canonical output is a new recipe number — no exceptions. An innocent-looking
refactor of the tokenizer that changes one byte of output is silently a new recipe wearing an old
number, which corrupts every stamp in the wild. The golden fixtures in
`surf-core/tests/golden_hash.rs` pin each recipe's output (v1 and v2 digests for representative
symbols per language) precisely to make that break loud.
- A recipe is kept as a verification mode only while it is expressible as a flag over the current
code (v1 ≡ v2 with the member-access rule off — one branch, no frozen copy). The N-1 support
policy and the broader version-table governance are tracked in #77.
number, which corrupts every stamp in the wild. Two layers make that break loud:
- **Golden fixtures** (`surf-core/tests/golden_hash.rs`) pin each recipe's exact digest for
representative symbols per language — both v1 (frozen forever) and v2.
- **Differential harness** (`surf-core/tests/differential_hash.rs`) re-runs the v1-vs-v2 A/B on
every build: zero benign-rename regressions, 100% catch on the semantic (free-swap) corpus. Any
future recipe change reruns the same gate.
- The recipe's rules are **dogfooded**: claims in `hubs/hash.md` are anchored to the canonicalization
code itself (`emit`, `collect_bound`, `is_member_access_name`), so editing the tokenizer without
updating this contract turns surf's own gate red.
- The external git-history replay over real corpora (prometheus for Go, nansen-python-sdk for
Python, surface itself for Rust, a large public TS repo) named in #77 runs out-of-tree against
release binaries; the in-tree harness above is its always-on, deterministic counterpart.
16 changes: 8 additions & 8 deletions docs/reference/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ The gate runs in four steps.
`Type` alone is ambiguous, `Type > method` is unique. In Python the path also resolves
non-callables: module constants, type aliases, and class attributes.
2. **Canonicalize.** Walk that span's syntax tree into a token stream. Whitespace and comments
aren't in the tree, so they drop out for free; identifiers are alpha-renamed to positional
placeholders (a *consistent* rename yields the same tokens, swapping two names does not);
operators, keywords, and literal *values* are kept verbatim. Python decorators are part of the
span, and a decorator's *name* is kept verbatim — so swapping `@cache` for `@lru_cache`, or
`@staticmethod` for `@classmethod`, changes the hash. **Member-access names are kept verbatim
too** (`obj.foo`, `pkg.Bar`, `Enum.VARIANT`), so re-pointing a span at a *different* external
symbol — `PointsTier.TIER_1` → `TIER_2`, `b.Del` → `b.Keep` — changes the hash even when the
name occurs once. (This last rule is the **v2** recipe; see [Hash recipes](./hash-recipes.md).)
aren't in the tree, so they drop out for free; operators, keywords, and literal *values* are
kept verbatim. Identifiers split into two kinds: a **bound** name (the symbol's own name,
parameters, locals, loop/destructuring binders) is alpha-renamed to a positional placeholder,
so a *consistent* local rename yields the same tokens; a **free** name (external members, call
targets, types, enum/constant references, object keys, decorators) is kept verbatim, so
re-pointing a span at a *different* symbol — `PointsTier.TIER_1` → `TIER_2`, `getHighest` →
`getLowest`, `@cache` → `@lru_cache` — changes the hash even when the name occurs once. (This
bound/free split is the **v2** recipe; see [Hash recipes](./hash-recipes.md).)
3. **Hash.** SHA-256 of that stream, truncated to 12 hex. A list `at:` combines its sites into one
hash, so the claim is stale if *any* listed span changes.
4. **Compare** against the stamp stored in the frontmatter (written by `surf verify`). The stamp
Expand Down
2 changes: 1 addition & 1 deletion hubs/anchor.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ anchors:
a 1-based `@N` positional suffix for genuine name collisions. Empty/zero/missing parts
are typed parse errors.
at: surf-core/src/anchor.rs > parse_anchor
hash: 2:0f9a4f9d406d
hash: 2:5499582e3a55
refs: []
---

Expand Down
6 changes: 3 additions & 3 deletions hubs/cli-check.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ anchors:
a mismatch → Changed; a clean match is tagged with whether the stamp was still v1. The
verdict is deterministic and needs no git.
at: surf-cli/src/check.rs > check_claim
hash: 2:36cbbc039ab1
hash: 2:66e7b4149d60
- claim: >
Scoping is opt-in and intersective: with neither --base nor --files every claim is checked.
A claim is in scope when any of its anchored files matches each active filter — the --base
Expand All @@ -17,7 +17,7 @@ anchors:
records whether it ever matched an anchored file (tallied before the --base filter), so a
pattern that scopes the gate to nothing is detectable after the walk.
at: surf-cli/src/check.rs > Scope > includes
hash: 2:d459cc00d69b
hash: 2:64277175938c
- claim: >
The gate fails closed: a hub whose frontmatter won't parse yields an Unresolvable
divergence (blocking the run) rather than being silently skipped, so a frontmatter typo
Expand All @@ -26,7 +26,7 @@ anchors:
pattern matched nothing, so a typo'd --files can't read as a clean run) and a count of
clean anchors still stamped under v1, so run can nudge the one-time `surf verify` upgrade.
at: surf-cli/src/check.rs > check_workspace
hash: 2:d8957ecb971d
hash: 2:4f5890aca70c
refs: []
---

Expand Down
4 changes: 2 additions & 2 deletions hubs/cli-for.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@ anchors:
versioned {version, path, matches} envelope (JSON), always exiting 0 whether or not anything
matched.
at: surf-cli/src/for_path.rs > run
hash: 2:4ef15aadc147
hash: 2:991c3bcc234c
- claim: >
find collects every claim whose anchored file equals the queried path (matched on path only —
no source parse), optionally narrowed to anchors whose first segment is the given symbol.
Malformed hubs are skipped rather than erroring, and results are sorted by hub then anchor.
at: surf-cli/src/for_path.rs > find
hash: 2:6eb52572ab68
hash: 2:5d4d45bdf364
refs: []
---

Expand Down
8 changes: 4 additions & 4 deletions hubs/cli-git.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,27 +11,27 @@ anchors:
- surf-cli/src/git.rs > renamed_to
- surf-cli/src/git.rs > log_stream
- surf-cli/src/git.rs > list_files_at
hash: 2:874f501ad8f1
hash: 2:95e280660c73
- claim: >
changed_files returns workspace-root-relative paths changed between the merge base of
base..HEAD and the working tree (git diff --relative), so the set intersects
workspace-relative anchors even when the workspace is a repo subdirectory; a missing merge
base (shallow clone) falls back to diffing the ref directly.
at: surf-cli/src/git.rs > changed_files
hash: 2:e395bff5410d
hash: 2:86115d32f1c7
- claim: >
log_stream returns the whole history window in one git spawn: every reachable commit (newest
first, children before parents) with its parents and its first-parent name-status diff.
Merges are included with --diff-merges=first-parent so surf stats can propagate hub state
through them, and --no-renames keeps a rename reading as delete+add.
at: surf-cli/src/git.rs > log_stream
hash: 2:c5d2fccc872e
hash: 2:a410122a0052
- claim: >
renamed_to asks git's rename detection (diff --name-status --find-renames HEAD) for the new
path a file moved to, letting lint warn and verify --follow re-point instead of hard-blocking.
Best-effort: a pure mv with no content match may show as delete+add and go undetected.
at: surf-cli/src/git.rs > renamed_to
hash: 2:a51ff4adba72
hash: 2:260267073598
refs: []
---

Expand Down
6 changes: 3 additions & 3 deletions hubs/cli-lint.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ anchors:
as does a file that git reports has moved. Block-level findings set a non-zero exit;
warnings alone keep exit 0.
at: surf-cli/src/lint.rs > lint_site
hash: 2:69018813a373
hash: 2:97f0946e74b0
- claim: >
Advisory granularity guidance (§8), never blocking: lint_under_coverage flags public
symbols — top-level functions and methods — in an already-anchored file that no claim
Expand All @@ -16,14 +16,14 @@ anchors:
uncovered symbol is reported once against the file's first anchoring hub. It runs only on
files whose anchors all resolved cleanly, so coverage nags never pile onto broken anchors.
at: surf-cli/src/lint.rs > lint_under_coverage
hash: 2:3ca608c27462
hash: 2:1a94fd3c8328
- claim: >
AGENTS.md enforcement is opt-in (§11.6): only when the file carries a surf:hubs marker
block does lint require it to link the configured hubs directory (which must exist),
blocking otherwise. It points agents at the directory to search — never enumerating
individual hubs, which would push an agent to read everything.
at: surf-cli/src/lint.rs > lint_agents_pointer
hash: 2:9a5f7d9fd0db
hash: 2:ac139b65f5f0
refs: []
---

Expand Down
2 changes: 1 addition & 1 deletion hubs/cli-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ anchors:
flag, or changing a default, diverges this anchor — re-read docs/reference/commands.md
before sealing.
at: surf-cli/src/main.rs > Command
hash: 2:0d910ff4886d
hash: 2:1af394872add
refs: ["../docs/reference/commands.md"]
---

Expand Down
4 changes: 2 additions & 2 deletions hubs/cli-scaffold.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ anchors:
init writes surf.toml + creates hubs/ in the cwd, and is idempotent — an existing
surf.toml is left untouched.
at: surf-cli/src/init.rs > run
hash: 2:dd57e4e7c5d9
hash: 2:640471b94678
- claim: >
new derives the target directory from the literal prefix of the first hub glob, then
writes a hub with no anchors so it is lint-clean immediately; it refuses to overwrite.
at: surf-cli/src/new.rs > hub_dir
hash: 2:d921913bf7bf
hash: 2:b9bfc7ec0b86
refs: []
---

Expand Down
Loading