fix(extract): route extensionless shebang scripts to their AST extractor#1680
Open
Stashub wants to merge 983 commits into
Open
fix(extract): route extensionless shebang scripts to their AST extractor#1680Stashub wants to merge 983 commits into
Stashub wants to merge 983 commits into
Conversation
perf: parallelize save_manifest file hashing with ThreadPoolExecutor
…ersion probe extract.py: clamp ProcessPoolExecutor max_workers to 61 on Windows (issue Graphify-Labs#1298). Python's ProcessPoolExecutor hard-caps at 61 on Windows via WaitForMultipleObjects; >61-core machines crashed on AST extraction. Clamp applied after all input paths (auto-compute, GRAPHIFY_MAX_WORKERS, --max-workers) to cover all three. build.py: skip ghost-merge when two AST nodes share (basename, label) key (issue Graphify-Labs#1257). When same-named symbols appear in same-named files across directories (e.g. two render() in two index.ts), last-writer-wins produced an arbitrary canonical node and mis-pointed all edges. Now tracked in _loc_collisions; ambiguous keys are skipped in Pass 2, leaving the ghost intact rather than merging into the wrong node. __main__.py: ignore OSError on unreadable .graphify_version probes (issue Graphify-Labs#1299). On restricted-permission installs or network mounts, .exists()/.read_text() raised PermissionError and crashed every graphify query/explain/path call at startup. All three FS probes now wrapped in try/except OSError: return. prs.py: resolve claude.cmd on Windows in prs.py claude-cli backend (issue Graphify-Labs#1288). The _call_llm and _call_claude_cli paths were already fixed; prs.py had the same bare ["claude", ...] call that fails on Windows npm installs with WinError 2. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-scanners Wire bandit and pip-audit into CI
…-summaries-rfc docs: add RFC for file-level node summaries
…1166 node-summaries RFC
…or graphify-mcp export.py: to_json now accepts community_labels and writes community_name onto each node. Previously cluster-only wrote labels only to GRAPH_REPORT.md, graph.html, and .graphify_labels.json — graph.json stored only the numeric cid, so query/MCP showed blank or numeric community values (Graphify-Labs#1305). __main__.py: pass community_labels=labels to to_json in cluster-only path. explain command now prefers community_name over raw numeric community field. serve.py: query and get_node read paths prefer community_name over community, with fallback so old graphs without the field still work. Adds --graph flag as an alias for the positional argument in graphify-mcp/_main(), fixing "unrecognized arguments: --graph" for users following the documented pattern shared by every other graphify subcommand (Graphify-Labs#1304). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_BASE_URL/ANTHROPIC_MODEL (Graphify-Labs#1273) Custom OPENAI/ANTHROPIC base-url + model env vars for self-hosted and proxy endpoints. CI green (3.10/3.12).
…s, function expressions (Graphify-Labs#1323) Extract JS/TS this.X=, exports.X=, prototype, class arrow fields, and function expressions (closes Graphify-Labs#1322). Validated locally against v8: full suite 2069 passed.
- Graphify-Labs#1315: add .psm1 to CODE_EXTENSIONS + _DISPATCH so PowerShell modules are indexed - Graphify-Labs#1327: synthesize a module node for Swift import targets (new LanguageConfig flag synthesize_import_module_nodes) so imports edges survive build.py pruning; strengthen the Swift dangling-edge test to also assert edge targets - Graphify-Labs#1317: dedupe parallel edges by (source,target,relation) in the --no-cluster and incremental update write paths so edge counts are deterministic and `update` is idempotent Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bump version to 0.8.40; document this release in CHANGELOG (PowerShell .psm1 indexing, Swift import survival, no-cluster edge dedup, custom OpenAI/Anthropic endpoints, JS/TS assignment-form extraction, community-name + --graph fixes, four production bug fixes, perf, security CI); add .psm1 to the README supported-extensions table. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adopts the approach from Graphify-Labs#1330 (thanks @duncan-daydream) on top of the v0.8.40 Swift import fix: _import_swift returns (id,label) module pairs, the extractor materializes a type=module anchor node per import, and _disambiguate_colliding_node_ids exempts type=module nodes so the same module imported from N files collapses to one shared node (enables reverse traversal "what imports CoreKit"). The --no-cluster writer now dedupes nodes by id and edges to match the clustered build_from_json path. Replaces the interim _import_label/synthesize_import_module_nodes mechanism. Adds tests/test_swift_import_resolution.py (cross-file collapse, build survival) and dedupe_nodes coverage. Refs Graphify-Labs#1327, Graphify-Labs#1330. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Same-named types in different packages left implements/inherits edges stuck on bare shadow stubs, isolating the real interface (_rewire_unique_stub_nodes only fixes the globally-unique case). New _resolve_java_type_references pass uses each referencing file's import statements (+ package decl) to build an FQN->def index, re-points dangling implements/inherits/imports edges to the exact definition, and drops the orphaned stub. External/stdlib imports stay unresolved (correct). Runs after id-disambiguation so target ids are final. Java-scoped; other _extract_generic languages share the same bare-name fallback and remain a follow-up. Adds tests/test_java_type_resolution.py (simple, ambiguous-by-import, build-survival). Refs Graphify-Labs#1318. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The query skill was split across two fragments so no platform got both capabilities: Claude had the vocab/IDF query-expansion step but no fallback if the CLI was unavailable; every other platform had the inline NetworkX fallback but the weaker raw-question matcher. Merge into one unified query reference + stub (Step 0 expansion -> CLI traversal -> inline NetworkX fallback, plus path/explain inline) shipped to all hosts. Remove the query_variant enum, its toml field, and the _CLI_ONLY_QUERY_HEADINGS coverage-audit exemption. Re-render all skill artifacts and re-bless expected/. skillgen check/audit-coverage/ monolith-roundtrip/schema-singleton all pass. Refs Graphify-Labs#1325. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixes Graphify-Labs#1338 (Unicode NFD/NFC): serve._find_node now matches tokenized labels; affected.resolve_seed NFC-normalizes + casefolds. Reviewed: full suite 2087 passed, CLI smoke clean, no regressions. Thanks @balloon72.
Fixes Graphify-Labs#1334: detect npm/yarn package.json workspaces (array + yarn object form), pnpm precedence preserved. Reviewed: full suite 2087 passed, no regressions. Thanks @balloon72.
Harden incremental no-cluster updates: fixes empty-write graph wipeout on no-op update --no-cluster (Graphify-Labs#1347) and git-hook subdir path resolution (Graphify-Labs#1348). Complementary to Graphify-Labs#1317. Validated: full suite 2107 passed, no-op re-run no longer wipes graph. Thanks @pkudinov.
…edge emission (Graphify-Labs#1331) (Graphify-Labs#1341) Index PowerShell .psd1 manifests + emit Import-Module/dot-source edges (closes Graphify-Labs#1331). Builds on the shipped .psm1 support. Validated: full suite 2107 passed, 18 new tests. Thanks @geektan123.
…stale edges (Graphify-Labs#1344) build_merge: prune a re-extracted file's stale nodes/edges before merge instead of accumulating (fixes Graphify-Labs#1283, Graphify-Labs#1285). Validated: full suite 2107 passed. Thanks @RelywOo.
…t fixes (Graphify-Labs#1357) Harden HTML export against U+2028/U+2029 script-breakout XSS + two crash-on-adversarial-input fixes (non-dict LLM JSON, _extract_parallel IndexError). Validated: full suite 2107 passed, HTML export smoke clean. Thanks @mistic96.
…nstructors (Graphify-Labs#1356) Capture property/field initializer constructor calls, build a per-file Swift type table from property/parameter declarations, and add a member-call resolution pass that types the receiver and emits an edge only when the type name resolves to exactly one definition. Additive and INFERRED-only; the is_member_call drop and the Graphify-Labs#543/Graphify-Labs#1219 god-node guards stay intact. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ned (Graphify-Labs#1361) The /graphify --update runbook called build_merge with absolute prune_sources but no root=, so _norm_source_file never relativized them to match the graph's relative source_file values. Nothing was pruned and changed/deleted files left ghost nodes that compounded on every incremental update. Fix the shared skillgen fragment to pass root='INPUT_PATH' and re-render all platform artifacts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Labs#1324) to_canvas built cards solely by iterating communities, so a graph with no community data (--no-cluster builds, or a missing analysis sidecar) wrote the empty 32-byte {"nodes":[],"edges":[]} shell while notes rendered fine. Fall back to one synthetic community covering every node so the canvas reflects the graph. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Semantic/LLM edges occasionally omit source_file, which build only normalized when already present, so the field reached graph.json empty and downstream validation flagged it. Backfill from the source (then target) node in build_from_json and in the --no-cluster raw-write path, which bypasses it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eleted-only Completes the source_file convention fix begun in Graphify-Labs#1344 (build_merge replace-on-re-extract) and Graphify-Labs#1361 (pass root= to build_merge in the --update runbook). Two gaps still let the full build and incremental --update emit different source_file bases for the same file, so the source_file-keyed replace missed and duplicates accumulated: 1. extraction-spec(.md/-compact.md): the subagent's source_file slot was an unpinned "relative/path", so it invented a base per run (and the node id, derived from the same path, drifted too). Pin it to the verbatim FILE_LIST path so _norm_source_file(root) canonicalizes every run identically. 2. core.md: the full build called build_from_json WITHOUT root=, so Graphify-Labs#1361's update-side root= had no matching base on the full-build side. Pass root='INPUT_PATH' at both sites (Step 4 export, Step 5 report) so the full build and --update relativize to the same base. update.md prune_sources = deleted only. Changed files are replaced by build_merge (Graphify-Labs#1344); once root= aligns the bases, leaving `changed` in prune_sources would delete the freshly re-extracted nodes. Engine (build.py) unchanged. Regenerated all skill artifacts via tools/skillgen/gen.py. Adds test_build_merge_root_collapses_convention_drift. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…te, and prefix-divergent labels (Graphify-Labs#1284, Graphify-Labs#1243) Three pass-2 guards (mirrored in the --dedup-llm pair collection): block merges when labels' embedded numbers differ as zero-padding-insensitive multisets; block cross-file merges of file-anchored rationale/document nodes (same-file still merges); and score cross-file long labels on plain Jaro instead of Jaro-Winkler so the prefix bonus can't fabricate merges of shared-prefix but token-divergent entities (jest-native vs react-native), while genuine cross-file duplicates still clear Jaro and same-file near-duplicates keep Jaro-Winkler. Co-Authored-By: van4oza <van4oza@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aphify-Labs#1365) ollama/openai/deepseek/kimi set max_tokens in their backend config, but the openai-compat dispatch read only max_completion_tokens (which only gemini defines), so their output silently capped at the 8192 fallback and truncated deep-mode JSON. Read either key and give the openai config an explicit cap; GRAPHIFY_MAX_OUTPUT_TOKENS still overrides. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oubleshooting Add changelog entries for the Swift cross-file (Graphify-Labs#1356), update prune (Graphify-Labs#1361), obsidian canvas (Graphify-Labs#1324), and edge source_file backfill (Graphify-Labs#1279) fixes that shipped without changelog notes, and a README troubleshooting entry for the LLM JSON-truncation warnings and how to reduce them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s#1609) C# had no member-call resolver (unlike Swift/Python/Ruby/TS/C++/ObjC), so `recv.Method()` fell back to a bare method-name match against label_to_nid — which, under ambiguity, silently mis-bound `_server.Save()` to an unrelated `Cache.Save()`. That's a WRONG edge, not just a missing one, and it left delegation-heavy C# call graphs (wrappers, service layers) blind across typed member/param boundaries. Mirrors the C++ Graphify-Labs#1547 pattern: - capture the member_access_expression receiver (simple identifier or `this`) into member_receiver and set is_member_call in the C# invocation branch; - defer ALL C# member calls with a receiver to the resolver (tgt_nid = None) so the bare in-file match can't fire, and emit a raw_call tagged lang="csharp"; - _csharp_member_type_table: file-wide name -> Type from fields, properties, parameters, and locals (incl. `var v = new T()`), first-binding-wins; - _resolve_csharp_member_calls: `this` -> enclosing class (EXTRACTED), capitalized -> the named type (EXTRACTED), else the receiver's table type (INFERRED), each gated by the single-definition guard; no method on the type -> no edge. Registered for .cs. Verified: the ambiguous `_server.Save()` now resolves to Server.Save and NOT Cache.Save; field/param/local/this/Type.static/cross-file all resolve; dynamic receiver and absent-method emit nothing; unqualified calls unregressed. 8 new tests, full suite 2841. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…raphify-Labs#1618) A node whose source_file equals the absolute scan root (e.g. a project-level semantic concept the LLM attributed to the whole repo) relativized to Path('.'), and _semantic_id_remap fed that into _file_stem, whose path.with_suffix("") raises `ValueError: '.' has an empty name`. The crash landed in final graph assembly — AFTER all LLM extraction cost was spent — writing no graph.json at all, and leaving `cluster-only` to then report "no graph found". Two guards: _file_stem returns "" for a name-less path (protects every caller, not just this one), and both _semantic_id_remap passes skip a root-equal source_file explicitly (it has no per-file identity to remap — id left untouched). Reported with a minimal LLM-free repro by @sub4biz. Not a 0.9.5 regression: _semantic_id_remap/_file_stem are byte-identical to 0.9.4; the latent path was only hit when dedup produced a root-source_file node. 4 regression tests (dot-path stem, remap no-crash, build_from_json with a root-level concept node, normal remap unaffected). Full suite 2849. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…abs#1445) _pick_seeds' gap_ratio cutoff discards any candidate scoring below 20% of the top score. On a multi-term natural-language query, one term's incidental EXACT label match on a node that is otherwise unrelated to the query's intent (e.g. a common word also used as a field name or identifier elsewhere in the corpus) scores ~1000x higher than any SUBSTRING match on the query's other, actually-relevant terms (_EXACT_MATCH_BONUS vs _SUBSTRING_MATCH_BONUS). The cutoff then silently discards every one of those substring-tier candidates as BFS seeds, so the traversal only ever explores the neighborhood of the one unrelated exact match, and `query` returns confidently-wrong results with no signal that anything went wrong. This matches Graphify-Labs#1445's reproduction exactly: a vague query that doesn't name a target symbol seeds from unrelated "concept-dense" nodes instead, even though the target node is present in the graph. _pick_seeds now optionally accepts the graph and the tokenized query terms; when supplied, it guarantees at least one seed per distinct term that has any match at all, so one term's collision cannot starve out the others. Ties within a term are broken by node degree, so an isolated incidental match doesn't out-rank a real, well-connected hub for that term. The parameters default to None and existing callers that don't pass them see byte-identical behavior (see test_pick_seeds_without_diversity_args_is_unchanged). Adds a regression test reproducing the exact failure shape from Graphify-Labs#1445 and confirms the previously-starved target node is recovered as a seed once G/terms are supplied. Full test suite (74 tests) and ruff both pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-Labs#1623) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#1619 A2) The query reference doc's inline vocab-harvest / fallback-search snippets used bare Path(...).read_text()/write_text(), which on Windows (default cp1252) crash with UnicodeEncodeError on the cross-language corpora the doc itself demonstrates (Cyrillic labels like обработчик). Add encoding="utf-8" to all five sites in the skillgen source fragment and regenerate; blessed expected/, skillgen --check + --monolith-roundtrip green. Scoped to the concrete reproduced crash; the larger Graphify-Labs#1619 findings (the Windows .exe interpreter-guard rewrite, INPUT_PATH backslash guidance, BOM handling) are a separate skill-template pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…receivers (Graphify-Labs#1630) The Graphify-Labs#1316 resolver handled `this.injectedField.method()`, but a receiver whose type comes from a local `const x = new Foo()` binding (Pattern A) or a type-annotated parameter — including inside a returned closure (Pattern B) — produced no calls edge, so `affected <method>` silently under-reported. - _ts_receiver_type_table: augment the per-file type table with local `new` bindings (name -> constructor type) and bare-typed parameters (`(svc: Svc)` -> svc: Svc), merged after the constructor-injection entries (which win on a name clash). Only a bare type_identifier is recorded — an array/union/generic/qualified/predefined type is skipped (precision). - walk_calls now descends into an inline/returned JS/TS closure that is not separately tracked in function_bodies (e.g. `return () => svc.doThing()`), attributing its calls to the enclosing function, instead of stopping at the arrow boundary. A tracked-body-id set prevents double-walking const-assigned arrows. The existing _resolve_typescript_member_calls then resolves both via the receiver type with its single-definition guard. Verified on the real-CLI shape (absolute paths + graphify-out cache): both patterns resolve, ambiguity binds to the right class (Svc not Cache), untyped/array-typed receivers emit nothing. 5 tests, full suite 2871. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hify-Labs#1236 follow-up) The Graphify-Labs#1236 fix guarded to_obsidian's member loop but not to_canvas, so `graphify export obsidian` (which also writes graph.canvas) still crashed with KeyError on a community member id absent from G — after the notes exported, leaving a partial mirror. Reported on 0.9.5 by @swells808. Apply the same `m in G and m in node_filenames` filter in both to_canvas loops: the box-sizing loop (so the group box matches the cards actually laid out) and the card-layout loop (so the sort/label deref and the node_filenames fallback never touch a dangling id). Regression test added alongside the to_obsidian one. Full suite 2872. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…fy-Labs#1631, Graphify-Labs#1638, Graphify-Labs#1632) Graphify-Labs#1631: a malformed LLM chunk (a stray non-dict entry in edges/nodes/hyperedges) crashed the AST+semantic merge and the semantic-cache write with `AttributeError: 'list' object has no attribute 'get'`, discarding every successful chunk and writing no graph.json. `_parse_llm_json` now sanitizes each fragment at the single parse chokepoint (dict entries only; non-list values coerced to []), protecting the cache writer, the adaptive-retry merge, and the CLI merge in one place. Graphify-Labs#1638: an unresolved bare npm import (`import colors from "tailwindcss/colors"`) emitted an imports_from edge to the bare id `colors`, which build.py's pre-migration alias index then remapped onto an unrelated local file of that stem (backend/utils/colors.py) - a confident EXTRACTED cross-language phantom edge, one per importing file. The external-import fallback now namespaces its target with the `ref` prefix (the J-4 convention), so it can never collapse to a local node id; the ref target has no node, so build drops it as an external reference. Graphify-Labs#1632: with a parallel LLM backend, extract_corpus_parallel merged chunk results in completion order, so which network call returned first reordered nodes/edges run-to-run even when the model returned identical content - churning graph.json. Chunks are now merged in deterministic submission order after the pool drains (matching the serial path); the progress callback still fires in completion order. The model's own content variance is unchanged (irreducible). Full suite: 2882 passed, 3 skipped. Validated end-to-end via a local wheel build on a mixed TS+Python corpus: `explain colors.py` shows only the real importer, and graph.json is byte-identical across repeated runs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
interface X extends A, B captured the parent list in iface_re group 2 but the handler only read group 1, so no inheritance edge was emitted. Split the parent list and emit one extends edge per parent (mirroring the class branch).
class Foo : Bar by baz produced no edge because the delegation_specifier loop only handled constructor_invocation and bare user_type children; the by form wraps user_type in an explicit_delegation node. Add that branch so the implements edge (and generic-arg recovery) fires.
…hify-Labs#1644 (kotlin by delegation) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…stant-receiver calls (Graphify-Labs#1640, Graphify-Labs#1634) Graphify-Labs#1640 (node extraction): the extractor only created nodes for `class Foo`, so plain `module Foo`, `Foo = Struct.new(...) do ... end`, `Foo = Class.new(Super)` and `Result = Data.define(...)` produced no container node — their methods hung off the file via `contains` with dot-less labels and no edge could target them. `module` is now a container type (methods attach via `method`, nested modules included), and a constant assignment whose RHS is Struct.new/Class.new/Data.define synthesizes a class node named after the constant, attaches block-defined methods to it, and emits an `inherits` edge for `Class.new(Super)`. Plain constant assignments (MAX = 100, X = Foo.new) are untouched. Graphify-Labs#1634 (resolution): constant-receiver singleton calls (`Service.call`, `Model.where`, `SomeJob.perform_async`) emitted no edge, so a Zeitwerk-autoloaded Rails app (no requires) had near-zero cross-file edges. resolve_ruby_member_calls now handles a capitalized receiver with any callee: bind to the class's owned singleton/instance method (`def self.call`) when present, else to the class node itself so inherited/dynamic class methods (ActiveRecord where/find_by) still give blast-radius. Namespaced receivers resolve by bare class name. The single-owning-class god-node guard is kept — ambiguous receivers resolve to nothing, never a wrong edge. The two compound: PaymentProcessor#process -> TaxCalculator.rate_for needs the module node (Graphify-Labs#1640) AND the resolver (Graphify-Labs#1634); both now land. Full suite: 2893 passed, 3 skipped. Adversarial smoke confirms no false class nodes from plain/multiple assignments and no self-loops on self-class calls. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
19 fixes/features since 0.9.5. Highlights: - Ruby: module/Struct.new/Class.new/Data.define container nodes (Graphify-Labs#1640) and constant-receiver singleton-call resolution (Graphify-Labs#1634) — Rails/Zeitwerk graphs now get real cross-file edges. - Kill cross-language phantom imports_from edges from unresolved bare npm imports (Graphify-Labs#1638); harden semantic extraction against malformed LLM chunks (Graphify-Labs#1631); deterministic graph.json node/edge ordering for parallel semantic backends (Graphify-Labs#1632). - Contributor extractor fixes: Apex interface multiple inheritance (Graphify-Labs#1645), Kotlin `by` delegation (Graphify-Labs#1644). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "What files it handles" code row omitted several extensions that reuse existing tree-sitter grammars (so the grammar count is unchanged): `.mts`/`.cts` (TypeScript, Graphify-Labs#1607, new in 0.9.6), `.cc`/`.cxx` (C++), `.kts` (Kotlin), `.psd1` (PowerShell), `.toc` (Lua). Apex (`.cls`/`.trigger`) and Terraform already have their own rows. `.r`/`.ejs`/`.ets` are intentionally left out — they are in CODE_EXTENSIONS but have no registered extractor. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…phantom cross-package edges (Graphify-Labs#1659) When a callee had exactly one same-named definition repo-wide, the cross-file resolver emitted a `calls` edge at INFERRED/0.8 even with no import path between caller and callee. On a monorepo this fabricated dependencies: a 14-package repo showed `platform`/`sidecar` depending on `registry-protocol` purely because it exported generically-named symbols that unresolved calls collapsed onto. JS/TS modules have no implicit cross-module scope, so a cross-file call is real only if the caller imported it. Direct JS/TS cross-file `calls` attribution is now gated on import evidence and left unresolved otherwise. Scoped to direct calls: other languages keep the Graphify-Labs#1553 single-candidate resolution (C/C++ headers, Ruby autoload, same-package implicit scope), and the indirect_call path (already INFERRED + callable-gated) is untouched. Also hardens caller/candidate -> file mapping to resolve via the node's `source_file` string (identifying the file node by its basename label) instead of `relative_to(root.resolve())`, which threw on a path-resolution/symlink mismatch and fell back to a non-matching absolute id — spuriously failing import evidence. This both makes the new gate safe and fixes legitimate cross-file calls being mislabeled INFERRED instead of EXTRACTED. Full suite: 2898 passed, 3 skipped. Verified via CLI on the reporter's repro (phantom dropped) and a control (imported call resolves EXTRACTED). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… cache word counts (Graphify-Labs#1649, Graphify-Labs#1655, Graphify-Labs#1656) Graphify-Labs#1649: detect_incremental tracks the converted markdown sidecar, and convert_office_file early-returned whenever the sidecar existed — so a .docx/ .xlsx edited after its first conversion never updated its sidecar and was reported "unchanged" forever, freezing the graph. It now re-converts when the source is newer than the sidecar (bumping the sidecar so the hash check catches it); an unchanged source still skips the rewrite (Graphify-Labs#1226). Graphify-Labs#1655: _md5_file/save_manifest/count_words used plain open()/stat(), which the Windows file APIs reject for absolute paths over 260 chars unless prefixed with `\\?\`. Deeply-nested files never hashed, their manifest entry never stabilized, and detect_incremental re-flagged them as changed every run. A new _os_path adds the extended-length prefix on win32 for change-detection I/O (mirror of cache._normalize_path, which strips it for keys). No-op elsewhere. Graphify-Labs#1656: detect() re-parsed every PDF/docx/text file to size the corpus on each run. Word counts are now memoized in the existing content-hash stat index (keyed by size + mtime_ns), so an unchanged file is parsed once. file_hash's fastpath is guarded so a word-count-only entry (no hash) can't KeyError, and both writers augment a co-located entry in place instead of clobbering the other's field. Full suite: 2906 passed, 3 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… noise (Graphify-Labs#1635, Graphify-Labs#1646, Graphify-Labs#1657) Graphify-Labs#1635: the windows skill variant declared `name: graphify-windows`, but `graphify install --platform windows` writes it to ~/.claude/skills/graphify/ SKILL.md and Claude Code requires the folder name to equal the frontmatter `name` — the suffix broke discovery. platforms.toml now sets name = "graphify" (regenerated + re-blessed). Graphify-Labs#1646: the OpenCode (and Kilo) plugin prepended its reminder with `&&`, which Windows PowerShell 5.1 rejects as a statement separator, breaking the first bash command of every session. Switched to `;` (valid in PowerShell 5.1, Bash, POSIX). Graphify-Labs#1657: the GRAPH_REPORT.md "Import Cycles" section printed "None detected" on documents-only corpora where imports don't exist — now gated on code nodes / import edges being present. The other two items in that issue (mojibake in manifest/report, stdout encoding) are already handled on current v8: both files are written UTF-8 and main() reconfigures stdout/stderr to UTF-8. Full suite: 2909 passed, 3 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a benchmark writeup covering graphify as long-term memory (LOCOMO, LongMemEval-S vs mem0/supermemory/bm25/dense/hybrid) and as a code-intelligence layer (ERPNext), run on graphify's own harness with competitors as adapters: one shared model (Kimi K2.6), identical budgets, shared BGE-m3 embedder where allowed, and a judge blind-validated against a second judge (90.6% agreement, kappa 0.81). Numbers are wins-forward but every retained figure is exact; the supermemory recall comparison is labeled embedder-confounded. README gets a short Benchmarks section linking to it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…es (Graphify-Labs#1666) krishnateja7 reported that on a full-repo run a stable subset of Ruby files yields zero nodes (not even a file node), each fine in isolation, drop set byte-stable across runs. Root cause is a transient batch/parallel extraction that produces an empty result, which then gets cached and persists. Every extractable file yields at least a file node, so a zero-node result is anomalous. Both extraction paths (parallel worker and sequential fallback) now skip the cache write when a non-error result has no nodes, so a rerun re-extracts and self-heals instead of loading the stale empty. extract() also warns, listing the files that landed in the graph with zero nodes, so the previously-silent blindness in affected/explain is visible. This addresses the persistence and the silent blindness. The underlying trigger (why a valid file occasionally extracts empty when co-processed with certain others) was not reproducible with synthetic corpora; the warning now surfaces it for a concrete report if it recurs. Full suite: 2912 passed, 3 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y-Labs#1241) `_dynamic_import_js` emitted a deferred `import('./x')` as a plain `imports_from` edge, so `find_import_cycles` counted it as a static import. A file that statically imports another which dynamically imports it back was reported as a phantom circular dependency. Keep the edge as `imports_from` (the dependency stays visible in the graph) but mark it `deferred`, and skip deferred edges in `find_import_cycles`. Closes Graphify-Labs#1241
…d/mixed-case extensions (Graphify-Labs#1671)
…aphify-Labs#1241 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…od-bound callers in affected (Graphify-Labs#1668, Graphify-Labs#1669) Graphify-Labs#1668: Ruby `include`/`extend`/`prepend <Const>` in a class/module body now emits a `mixes_in` edge to the module. The mixin is captured during the node walk and resolved cross-file by resolve_ruby_member_calls (single-owner guard, reusing the Graphify-Labs#1640 module nodes as targets). The shared call pass skips these markers so they are not mislabeled as `calls`. `extend self` and non-constant args are skipped; ambiguous/undefined modules produce no edge. Rails concern composition is now visible to affected/explain. Graphify-Labs#1669: affected <Class> seeds the reverse walk with the root's own member nodes (one method/contains hop) so callers that bind at method granularity (e.g. Service.call -> the def self.call node, Graphify-Labs#1634) are reachable from the class. method/contains stay out of the general relation-filtered walk (no forward noise), and the seeded member nodes are not reported as hits. Full suite: 2924 passed, 3 skipped. Verified end-to-end (Rails-shaped repros) plus edge cases: extend self / undefined / ambiguous mixins emit nothing, mixins are not emitted as calls, member methods aren't reported, class-level callers still resolve, and one-hop seeding does not pull in downstream classes' methods. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the old v4-hosted SVG wordmark with the new brand logo (graph-cube icon + "Graphify" on the green brand gradient), tightly cropped from the source export (1384x645, ~2.15:1, even ~90px padding). Served from docs/logo.png on v8. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
detect.classify_file already labels extensionless files with a bash/python/ node/... shebang as CODE via _shebang_interpreter, but _get_extractor dispatched purely on path.suffix — so a CLI entry point like `devctl` or `manage` was detected as code and then silently contributed zero nodes to the graph (its doc-referenced symbols stayed dangling stubs). Resolve extensionless files through the same _shebang_interpreter and a new _SHEBANG_DISPATCH map. Only interpreters with a real extractor are mapped (python/bash-family/node/ruby/lua/php/julia); detect's wider set (perl, fish, tcsh, Rscript) stays unmapped and skipped rather than being mis-parsed by a wrong grammar. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
3ba6ab5 to
94239d6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
detect.classify_filealready labels extensionless files with a bash/python/node/... shebang as CODE (via_shebang_interpreter), butextract._get_extractordispatches purely onpath.suffix. An extensionless CLI entry point —devctl,manage,gradlew-style wrappers — is therefore detected as code and then silently contributes zero nodes to the graph, and its doc-referenced symbols stay dangling stub IDs.Found in the wild: a bash CLI
devctl(#!/usr/bin/env bash, no extension) — the main executable of the corpus — was missing from the graph entirely while the 5 other code files extracted fine; the health diagnostic showed dangling-endpoint edges pointing at its never-created node IDs.Fix
In
_get_extractor, resolve extensionless files through the samedetect._shebang_interpreterand a new_SHEBANG_DISPATCHmap, so extract honors the same signal detect already trusts. Only interpreters with a real extractor are mapped (python/bash-family/node/ruby/lua/php/julia); detect's wider set (perl, fish, tcsh, Rscript) stays unmapped and skipped rather than being mis-parsed by a wrong grammar.Tests
test_extensionless_shebang_via_dispatch— bash & python3 shebangs, incl. theenv -Ssplit-args formtest_extensionless_without_usable_shebang_stays_unsupported— plain text and perl stayNonetest_extract_extensionless_bash_cli_end_to_end— node IDs follow the path-stem scheme, so doc-created stub IDs merge with the real code nodespytest tests/test_extract.py tests/test_detect.py— 235 passed.🤖 Generated with Claude Code