Skip to content

Detect duplicate/divergent implementations: span hash can't tell a shared import from a local copy #23

Description

@Connorrmcd6

Summary

An anchored span's hash is location-blind to symbol resolution: a function that calls foo(...) hashes identically whether foo is imported from a shared module or re-declared as a byte-identical top-level function in the same file. As a result, an anchor on the caller cannot express or guard the very common architectural invariant "this uses the shared implementation, not a local duplicate."

Duplicate-implementation drift (two copies of the same logic silently diverging over time) is one of the most common ways documented behavior goes stale — and it's exactly the case surface currently can't see.

Environment

  • surf 0.1.0, prebuilt binary surf-aarch64-apple-darwin (curl installer)
  • macOS (Apple Silicon, arm64)
  • Target: TypeScript (tree-sitter TS grammar)

Concrete repro (observed on a real codebase)

A server action claimEntry derives a value via findCurrentEvent(bootstrap.events). We anchored the caller:

- claim: "... derives the current gameweek via findCurrentEvent ..."
  at: "src/lib/invite/actions.ts > claimEntry"
  1. Source uses import { findCurrentEvent } from "@/lib/fpl/utils";surf verify stamps hash H.
  2. Replace the import with a byte-identical top-level copy in the same file:
    function findCurrentEvent(events: {...}[]): number { /* same body */ }
    surf checkGREEN (the caller's span hash is still H — the import line and the top-level
    function definition both sit outside claimEntry's span, and the call expression is unchanged).
  3. Only when the duplicate is declared inside claimEntry's body does the span change → DIVERGED.

So surface neither flags introducing a top-level duplicate nor removing one — both directions stay green.

Why it matters

  • "Uses the shared util / delegates to X / no copy-paste of this logic" is a frequent, high-value
    architectural claim, and it's silently unenforceable today.
  • It's the inverse of the gate's purpose: the prose can assert a dependency that the hash can't back up,
    which is the kind of false confidence the README's honesty section warns about.

Proposed directions (any one)

  1. Per-anchor dependency assertion — e.g. calls: / references: on an anchor; lint/check
    fails if the named symbol isn't imported or called within the span's resolved scope.
  2. Repo-wide duplicate-implementation lint — surface already computes subtree hashes; flag when two
    function/method subtrees in different locations share an (identical or near-identical) hash, so
    copy-paste forks surface as a warning.
  3. Resolution-aware span hashing — when hashing a span, resolve called identifiers to their
    definition sites and fold the definition's hash (or its module path) in, so import-vs-local-copy
    yields different hashes.

Related


Found while dogfooding surface on a real TypeScript codebase (OffsideFPL) — specifically while
refactoring a duplicated private function onto a shared util and discovering the gate stayed green
through the whole refactor.

Metadata

Metadata

Assignees

No one assigned

    Labels

    dxDeveloper experience / CLI ergonomicsenhancementNew feature or requestresearchOpen question / needs a design spike (proposal §11)

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions