Skip to content

fix(rank): keep the official site on page one for known-item searches#92

Merged
ErikChevalier merged 1 commit into
mainfrom
fix/navigational-result-ranking
Jun 19, 2026
Merged

fix(rank): keep the official site on page one for known-item searches#92
ErikChevalier merged 1 commit into
mainfrom
fix/navigational-result-ranking

Conversation

@ErikChevalier

Copy link
Copy Markdown
Contributor

Problem

Searching a brand or product name returned the official site buried below encyclopedia/forum pages, or off the first screen entirely, even though the metasearch aggregator ranked it first. Diagnosed by running real queries through the live pipeline: this is a precision@1 placement problem in the re-rank layers, not a recall problem (the official site was retrieved and ranked #1, then two later layers demoted it).

Two root causes, both fixed

1. The freshness sort flattened relevance into list position. The default fresh+relevant sort discarded each result's real score (which carries the lexical blend and the navigational boost) and re-derived a flat 1/(60+index) from rank position. That compressed a large lead to a hair, so a single dated result could leapfrog an undated official homepage. The aggregator now carries its final score on each result, and the freshness blend multiplies that, so recency only reorders results of comparable relevance and can never overtake a strong match. (The module docstring claimed it scaled "the relevance score" — it was scaling position.)

2. The AI-slop blocklist had high-value false positives, on by default. The bundled blocklist is bulk-merged from community browser "hide AI" lists whose purpose is to keep AI tools out of search results. They include the official sites of AI companies and major developer hubs — github.com, huggingface.co, openai.com, claude.ai, and 13+ more confirmed. With the filter defaulting to downrank, those sites were pushed off page one. A curated allowlist (resources/blocklist/allowlist.txt) is now subtracted from the effective blocklist at both load and build time (subdomains too), so the filter can only sink genuine low-quality domains.

Verification

Notes

  • Internal relevance score is kept out of the public JSON contract.
  • Research-backed follow-ups (PSL + urlmr graded navigational boost, BM25F title>snippet field weighting, registrable-domain dedup to stop one host crowding page one, trust-weighted RRF) are queued separately.

🤖 Generated with Claude Code

@ErikChevalier ErikChevalier force-pushed the fix/navigational-result-ranking branch 2 times, most recently from 438363f to 0705225 Compare June 19, 2026 06:13
Two re-rank layers could bury a navigational result (the official site of the
searched name) below encyclopedia/forum pages even though the aggregator ranked
it first. Both are fixed on-device with no new requests.

1. Freshness sort flattened relevance into list position. The default
   fresh+relevant sort discarded each result's real score (which carries the
   lexical blend and the navigational boost) and re-derived a flat 1/(60+index)
   from rank position, so a single dated result could leapfrog an undated
   official homepage. The aggregator now carries its final score on each result
   and the freshness blend multiplies THAT, so recency only reorders results of
   comparable relevance and never overtakes a strong match.

2. AI-slop blocklist false positives. The bundled blocklist is merged from
   community "hide AI" browser lists that include the official sites of AI
   companies and major developer hubs (github.com, huggingface.co, openai.com,
   and more). With the filter on by default those sites were downranked off the
   first screen. A curated allowlist (resources/blocklist/allowlist.txt) is now
   subtracted from the effective blocklist at load and build time, including
   subdomains, so the filter can only sink genuine low-quality domains.

Regression tests added for both. Full suite, ruff, and mypy green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ErikChevalier ErikChevalier force-pushed the fix/navigational-result-ranking branch from 0705225 to f9bbb2e Compare June 19, 2026 07:00
@ErikChevalier ErikChevalier merged commit 2a6c162 into main Jun 19, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant