feat(search): trigram fuzzy + concept recall (typo+semantic, zero regression) by fedster99 · Pull Request #31 · fedster99/supamail

fedster99 · 2026-06-19T16:38:07Z

What

Closes the two zero-scoring eval categories — typo and semantic — by adding two pure-Postgres, recall-only branches to the free-text search path, fused strictly below exact lexical matches. Stacks on #26 (the search-layer + eval + email-intel stack).

Why this shape

Reading the eval corpus showed the two gaps are different problems:

typo (invioce, secuirty, candiate, metrcs) = misspellings of indexed terms → trigram fuzzy.
semantic (vacation, hiring, breach, expense) = concept words with no literal overlap → query concept-expansion (not person-resolution).

So this is the roadmap's Phase 1 / Tier 1.5 (docs/search-eval.md), done as the smallest eval-justified slice — no embeddings, no new deps.

How

apps/api/src/search/expand.ts (new): significantTerms (fuzzy tokens) + a curated, general email/business CONCEPT_THESAURUS (expandConcepts).
Fuzzy: word_similarity over the trigram-indexed subject/sender/recipient columns from 0007, via OPERATOR(extensions.<%) so the gin_trgm_ops GINs drive it; pg_trgm.word_similarity_threshold = 0.4 set per-statement with SET LOCAL in the read-only txn.
Concept: widens the tsquery (primary || synonyms) so an intent word retrieves mail that never says it literally.
Zero-regression by construction: an is_primary flag (any exact lexical hit) is the leading ORDER BY key, so exact matches always rank above fuzzy/concept-only matches. The recall branches add misses; they can never reorder existing hits. The partial-index deleted_in_provider = false literal is preserved.

Result — `pnpm eval:search`, before → after

Category	nDCG@10 before	after
Headline	0.731	0.953
typo	0.00	0.65
semantic	0.00	0.98
lexical	0.91	0.995
operator / ranking / phrase / email-intent	1.00	1.00 (held)

Residual typo misses are deliberate scope edges (a body-only term — body trigram is unindexed for perf; a transposition below the 0.4 threshold whose third judged doc has no literal token), left unchased to avoid overfitting the threshold to a 31-doc synthetic corpus. Open-ended semantics beyond the curated thesaurus remain Phase-2 (opt-in pgvector).

Verification

pnpm typecheck ✓ · pnpm build ✓
pnpm test → 182 passed (incl. 12 new search-expand unit tests) ✓
pnpm test:db:live → 118/118, incl. search-quality.live-db.test.ts (the quality gate) and search.live-db.test.ts against the real engine ✓

Harness Impact

docs/search-eval.md updated: Phase 1 marked shipped with the measured delta and the goal marked met. No AGENTS.md / ADR change needed (additive to the 0015 engine; no schema change).

🤖 Generated with Claude Code

Closes the two zero-scoring eval categories (typo, semantic) without touching the four already at 0.91–1.00. Two pure-Postgres, recall-only branches added to the free-text path (search/expand.ts + compile.ts): - Fuzzy (typo): significant query tokens match via pg_trgm word_similarity over the trigram-indexed subject/sender/recipient columns from 0007, driven through OPERATOR(extensions.<%) with pg_trgm.word_similarity_threshold set to 0.4 per-statement (SET LOCAL) inside the read-only transaction. - Concept (semantic): a curated, general email/business concept thesaurus widens the tsquery (primary || synonyms), so an intent word retrieves mail that never says it literally (vacation -> travel). Deterministic, zero-dependency Tier-1.5; Phase-2 embeddings remain the durable answer. Safety: an is_primary flag (any exact lexical hit) is the leading ORDER BY key, so every exact match ranks strictly above any fuzzy/concept-only match. The recall branches can add results a keyword search misses but can never reorder the ones it already gets right — zero regression by construction. Eval (pnpm eval:search), before -> after: headline nDCG@10 0.731 -> 0.953, recall@10 0.724 -> 0.953 typo 0.00 -> 0.65 semantic 0.00 -> 0.98 lexical 0.91 -> 0.995 operator/ranking/phrase/email-intent 1.00 (held) Verified: pnpm typecheck, pnpm test (182 passed incl. 12 new expand unit tests), pnpm build, pnpm test:db:live (118/118 incl. the search-quality quality gate against the real engine). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vercel · 2026-06-19T16:38:13Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
supamail	Ready	Preview, Comment	Jun 19, 2026 4:38pm

vercel Bot deployed to Preview June 19, 2026 16:38 View deployment

This was referenced Jun 19, 2026

feat(eval): trustworthy search eval — frozen clock, graded relevance, significance A/B, guards #32

Open

Search: pure-Postgres engine + fuzzy/concept recall + trustworthy eval (stack → main) #33

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(search): trigram fuzzy + concept recall (typo+semantic, zero regression)#31

feat(search): trigram fuzzy + concept recall (typo+semantic, zero regression)#31
fedster99 wants to merge 1 commit into
fedster99/search-eval-roadmapfrom
fedster99/search-fuzzy-semantic

fedster99 commented Jun 19, 2026

Uh oh!

vercel Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fedster99 commented Jun 19, 2026

What

Why this shape

How

Result — pnpm eval:search, before → after

Verification

Harness Impact

Uh oh!

vercel Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Result — `pnpm eval:search`, before → after

vercel Bot commented Jun 19, 2026 •

edited

Loading