feat(hash): full bound/free identifier split, v2 recipe (#77)#147
Merged
Conversation
v1 alpha-renamed every identifier, so re-pointing a span at a different single-occurrence external symbol was byte-identical and passed the gate silently while the claim's prose became false. v2 now alpha-renames only *bound* names (the symbol's own name, params, locals, loop/range/comprehension vars, with/catch aliases, generic params, destructuring binders) and emits every *free* identifier verbatim, so swapping a member, call target, type, enum/const, object key, or decorator is loud even when it occurs once, while consistent local renames stay quiet. This completes v2 into the recipe the original design intended: it subsumes the member-access-only first cut and the Python decorator special case (#8). 0.7.0 is unreleased, so v2 is redefined in place rather than adding a v3; v1 stays byte-frozen (golden fixtures confirm released stamps are safe). Binding detection is tree-sitter-only and fail-closed: a position not positively recognized as a binding defaults to free. The one accepted approximation (match-arm pattern identifiers are left free) is documented and pinned. A new in-tree differential harness gates the change — 13 benign renames with zero regressions, 12 semantic free-swaps with 100% v2 catch / 0% v1 — alongside re-pinned golden digests, the version-table governance in docs/reference/hash-recipes.md, and dogfood claims in hubs/hash.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #77. Completes the v2 hash recipe into the bound/free split — the principled fix for the gate's highest-value blind spot.
v1 alpha-renamed every identifier, so a one-token change re-pointing a span at a different single-occurrence external symbol (
PointsTier.TIER_1→TIER_2,getHighest→getLowest) produced an identical hash.checkstayed green while the claim's prose silently became false.What changed
surf-core/src/hash.rs). v2 alpha-renames only bound names — the symbol's own name, parameters, locals, loop/range/comprehension vars,with/catchaliases, generic params, destructuring binders — and emits every free identifier verbatim. Re-pointing at a different member, call target, type, enum/const, object key, or decorator is now loud even when it occurs once; consistent local renames stay quiet. Two-pass:collect_bound/bind_here(per-family binding tables) thenemit.xthe param vsobj.x.)surf-core/tests/differential_hash.rs, new). In-tree v1-vs-v2 A/B across all four languages, kept in-tree so any future canonicalization change reruns the same gate.docs/reference/hash-recipes.md; updatedhow-it-works.mdandCHANGELOG.md; dogfooded — claims inhubs/hash.mdanchored tocollect_bound/is_member_access_name, all hubs re-stamped.The external git-history replay over real corpora (prometheus / nansen-python-sdk / a TS repo) named in the issue runs out-of-tree against release binaries; the in-tree harness is its deterministic, always-on counterpart.
Verification
All gates run on this branch:
cargo fmt --all --check→ cleancargo clippy --all-targets --all-features -- -D warnings→ 0 warningscargo test --all→ all pass (incl. golden + differential)surf check→ all anchored spans match🤖 Generated with Claude Code