Skip to content

resolver: implement bare-namespace cross-source search (Python -> 10/10 conformance)#10

Merged
kurtseifried merged 1 commit into
mainfrom
feat/cross-source-search
May 28, 2026
Merged

resolver: implement bare-namespace cross-source search (Python -> 10/10 conformance)#10
kurtseifried merged 1 commit into
mainfrom
feat/cross-source-search

Conversation

@kurtseifried

Copy link
Copy Markdown
Contributor

Summary

Phase 2.5(d). Closes the last conformance failure — Python Server-API now passes 10/10 fixtures, matching the canonical SecID-Service Worker. Python is now ready to serve as the reference for the TypeScript and Go Server-API ports.

What's added

A new resolution mode: cross-source search. When a query like `secid:advisory/CVE-2021-44228` has no DNS-rooted namespace match (the input is an ID, not a domain), the resolver walks all namespaces of the type, checks each for child-pattern matches against the input, and aggregates results sorted by weight.

Three new helpers in `resolver.py`

  • `_slug_from_pattern(pat)` — extracts canonical source slug from a regex like `(?i)^cve$` → `'cve'`. Returns `None` for complex patterns where a clean slug isn't extractable.
  • `_all_namespaces_of_type(store, secid_type, registry_dirs)` — enumerates namespaces from both the store and filesystem; eagerly loads discovered files into the store. Reads the `namespace` field from inside each JSON to avoid the reverse-DNS path ambiguity.
  • `_cross_source_search(store, secid_type, search_term, registry_dirs)` — walks each namespace's children for matches, builds fully-qualified result SecIDs, substitutes URL templates (reusing Phase 2.5a's helper), sorts by weight.

Hook in `resolve()`

When `_match_namespace()` returns no match, fall back to cross-source. If results found, status=found. Otherwise not_found as before.

Result counts

Implementation Results for `secid:advisory/CVE-2021-44228`
Worker (canonical) 21
Python (this PR) 18

The 3-result gap likely reflects (a) Python walks only one level of children (no grandchildren), (b) some namespaces have patterns the Worker indexes differently. The fixture's `min_results >= 5` intentionally allows this tolerance — cross-source agreement on "many namespaces match" is the behavioral contract, not exact-count parity.

Conformance progression

Phase Pass / Total
Initial (before this initiative) 3 / 10
After Phase 2.5(a) — URL templates + data block 7 / 10
After Phase 2.5(c) — discovery endpoints 9 / 10
After this PR — cross-source search 10 / 10

Tests

5 new unit tests for slug extraction and cross-source edge cases. 28/28 pass locally.

What this unblocks

Python Server-API is now at conformance parity with SecID-Service. This unblocks:

  • Phase 3 — TypeScript Server-API (can be built against Python as reference + conformance suite as gate)
  • Phase 4 — Go Server-API (same)
  • Phase 6 — End-to-end nightly compatibility tests across all SDK languages and the Worker

🤖 Generated with Claude Code

Phase 2.5(d). Closes the last conformance failure — Python Server-API
now passes 10/10 fixtures, matching the canonical SecID-Service Worker.

When a query like `secid:advisory/CVE-2021-44228` has no DNS-rooted
namespace match (the input is an ID, not a domain), the resolver now
walks all namespaces of the type looking for child patterns that match
the input. Aggregates results sorted by weight descending.

Implementation in resolver.py:

- _slug_from_pattern(pat) — extracts canonical source slug from a
  source-level regex like '(?i)^cve$' -> 'cve'. Returns None for
  complex patterns where a clean slug isn't extractable.

- _all_namespaces_of_type(store, secid_type, registry_dirs) —
  enumerates known namespaces from both the store and filesystem.
  Filesystem-discovered namespaces are eagerly loaded into the store
  so subsequent queries don't re-read them. Uses the 'namespace' field
  inside each JSON file as canonical, avoiding the reverse-DNS
  path-to-namespace ambiguity (e.g., 'uk/gov/legislation.json' is
  unambiguously 'legislation.gov.uk' once we read the file).

- _cross_source_search(store, secid_type, search_term, registry_dirs) —
  for each namespace, walks one level of children looking for
  pattern matches. Builds a full-namespace SecID for each match,
  substitutes URL templates (reusing the helper from Phase 2.5a),
  sorts by weight descending.

Hook in resolve(): when `_match_namespace()` returns no match, fall
back to cross-source. If cross-source finds results, return them with
status=found. Otherwise return not_found as before.

Behavior:

  Worker (canonical):   21 results for secid:advisory/CVE-2021-44228
  Python (this PR):     18 results

  The 3-result gap likely reflects (a) Python walking one level of
  children only (no grandchildren), (b) some namespaces having patterns
  the Worker indexes differently. The fixture's `min_results >= 5`
  intentionally allows this tolerance — cross-source agreement on
  "many namespaces match" is the contract, not exact result count.

Tests:
- 5 new unit tests for slug extraction and cross-source edge cases
- 28/28 tests pass locally

Conformance progression:
  Initial:               3 / 10 against Python
  After Phase 2.5(a):    7 / 10
  After Phase 2.5(c):    9 / 10
  After this PR:        10 / 10

Python now serves as the reference for TS/Go ports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kurtseifried kurtseifried merged commit 6f003ef into main May 28, 2026
@kurtseifried kurtseifried deleted the feat/cross-source-search branch May 28, 2026 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant