Skip to content

feat(graphile-search): replace sigmoid weighted-average with Reciprocal Rank Fusion (RRF)#1287

Open
pyramation wants to merge 7 commits into
mainfrom
feat/rrf-scoring
Open

feat(graphile-search): replace sigmoid weighted-average with Reciprocal Rank Fusion (RRF)#1287
pyramation wants to merge 7 commits into
mainfrom
feat/rrf-scoring

Conversation

@pyramation

@pyramation pyramation commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Replaces sigmoid weighted-average in searchScore with Reciprocal Rank Fusion (RRF) — rank-based fusion that handles BM25's unbounded scores correctly (sigmoid crushes BM25 signal: scores -20 vs -10 both map to near-zero).

- searchScore = Σ(sigmoid(score_i) × weight_i) / Σ(weight_i)
+ searchScore = Σ(weight_i / (rrfK + rank_i)) / Σ(weight_i / (rrfK + 1))

Ranks injected via ROW_NUMBER() OVER (ORDER BY score ...) per adapter. Normalized by max possible RRF over active adapters only → bounded 0..1. New rrfK preset option (default 60).

Removes SearchConfig.normalization field entirely (pre-launch, no deprecation). Deletes sigmoid test suites + dead code. 21 new RRF tests cover single/multi adapter combos, chunk tables, custom weights, recency boost, custom rrfK.

Also adds VACUUM ANALYZE to mega-seed.sql after data insertion — forces ParadeDB's BM25 index to be fully built before tests run, fixing a timing race with ROW_NUMBER() window functions.

Skills updated in both repos. Related: #1047, #1050, constructive-skills#173

Link to Devin session: https://app.devin.ai/sessions/4a9f098c74fb4cb6a9b6868fcff321db
Requested by: @pyramation

…al Rank Fusion (RRF)

- Replace sigmoid normalization + weighted-average scoring with rank-based RRF
- Inject ROW_NUMBER() window functions per adapter for rank computation
- Normalize RRF scores to [0,1] using only active adapters in denominator
- Add rrfK preset option (default 60, configurable smoothing constant)
- Preserve @searchConfig weights (maps to weighted RRF contributions)
- Preserve recency boost (applied as post-RRF multiplier)
- Deprecate normalization field (now no-op, RRF doesn't need it)
- Add comprehensive rrf-scoring.test.ts (21 tests covering single/multi
  adapter, chunks, weights, recency, custom rrfK)
- Update search-config-integration test for deprecated normalization

Closes constructive-io/constructive-planning#1047
@devin-ai-integration

Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment, CI, and merge conflict monitoring

@blacksmith-sh

blacksmith-sh Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Found 4 test failures on Blacksmith runners:

Failures

Test View Logs
RRF scoring — custom @searchConfig weights/
weighted RRF still produces score 1.0 for rank-1 document (single adapter active)
View Logs
RRF scoring — custom rrfK parameter/
with small rrfK, score difference between rank 1 and rank 2 is larger
View Logs
RRF scoring — single adapter scenarios/
BM25 only — searchScore is 0..1 and best match gets score 1.0
View Logs
RRF scoring — single adapter scenarios/
tsvector only — searchScore is 0..1 and correctly ranked
View Logs

Fix in Cursor

- Remove normalization from SearchConfig interface (not deprecated — deleted)
- Remove sigmoid strategy parameter from normalizeScore (only used as RRF fallback)
- Remove deprecated normalization warning code
- Delete sigmoid normalization test suite
- Delete deprecated normalization no-op test from rrf-scoring
- Remove normalization field from node-type-registry schema
- Update graphile-search skill docs for RRF
RRF scoring can produce exactly 1.0 for rank-1 documents when all
adapters agree. Changed toBeLessThan(1) to toBeLessThanOrEqual(1).
…er path

ROW_NUMBER() window functions were being injected into the SELECT list
for every per-adapter filter query (e.g. bm25Body, tsvTsv), but ranks
are only needed for the unifiedSearch RRF composite path. The window
function forced PostgreSQL to evaluate the BM25 score expression in the
ORDER BY of the window, which could interact poorly with ParadeDB's
BM25 index build timing. The searchScore lambda already has a fallback
that estimates rank from normalized score for individual filters.
…seed

The BM25 test failures were caused by a ParadeDB index race condition:
after INSERTing data, the BM25 index isn't immediately queryable when
a ROW_NUMBER() window function references the BM25 operator in its
ORDER BY (adding computational pressure during index build). Adding
VACUUM ANALYZE after seeding forces the BM25 index to be fully built
before tests run.

Restores ROW_NUMBER() injection in per-adapter filter path (needed for
correct RRF scoring when individual filters are combined with
searchScore).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant