Skip to content

feat(search): trigram fuzzy + concept recall (typo+semantic, zero regression)#31

Open
fedster99 wants to merge 1 commit into
fedster99/search-eval-roadmapfrom
fedster99/search-fuzzy-semantic
Open

feat(search): trigram fuzzy + concept recall (typo+semantic, zero regression)#31
fedster99 wants to merge 1 commit into
fedster99/search-eval-roadmapfrom
fedster99/search-fuzzy-semantic

Conversation

@fedster99

Copy link
Copy Markdown
Owner

What

Closes the two zero-scoring eval categories — typo and semantic — by adding two pure-Postgres, recall-only branches to the free-text search path, fused strictly below exact lexical matches. Stacks on #26 (the search-layer + eval + email-intel stack).

Why this shape

Reading the eval corpus showed the two gaps are different problems:

  • typo (invioce, secuirty, candiate, metrcs) = misspellings of indexed terms → trigram fuzzy.
  • semantic (vacation, hiring, breach, expense) = concept words with no literal overlap → query concept-expansion (not person-resolution).

So this is the roadmap's Phase 1 / Tier 1.5 (docs/search-eval.md), done as the smallest eval-justified slice — no embeddings, no new deps.

How

  • apps/api/src/search/expand.ts (new): significantTerms (fuzzy tokens) + a curated, general email/business CONCEPT_THESAURUS (expandConcepts).
  • Fuzzy: word_similarity over the trigram-indexed subject/sender/recipient columns from 0007, via OPERATOR(extensions.<%) so the gin_trgm_ops GINs drive it; pg_trgm.word_similarity_threshold = 0.4 set per-statement with SET LOCAL in the read-only txn.
  • Concept: widens the tsquery (primary || synonyms) so an intent word retrieves mail that never says it literally.
  • Zero-regression by construction: an is_primary flag (any exact lexical hit) is the leading ORDER BY key, so exact matches always rank above fuzzy/concept-only matches. The recall branches add misses; they can never reorder existing hits. The partial-index deleted_in_provider = false literal is preserved.

Result — pnpm eval:search, before → after

Category nDCG@10 before after
Headline 0.731 0.953
typo 0.00 0.65
semantic 0.00 0.98
lexical 0.91 0.995
operator / ranking / phrase / email-intent 1.00 1.00 (held)

Residual typo misses are deliberate scope edges (a body-only term — body trigram is unindexed for perf; a transposition below the 0.4 threshold whose third judged doc has no literal token), left unchased to avoid overfitting the threshold to a 31-doc synthetic corpus. Open-ended semantics beyond the curated thesaurus remain Phase-2 (opt-in pgvector).

Verification

  • pnpm typecheck ✓ · pnpm build
  • pnpm test182 passed (incl. 12 new search-expand unit tests) ✓
  • pnpm test:db:live118/118, incl. search-quality.live-db.test.ts (the quality gate) and search.live-db.test.ts against the real engine ✓

Harness Impact

docs/search-eval.md updated: Phase 1 marked shipped with the measured delta and the goal marked met. No AGENTS.md / ADR change needed (additive to the 0015 engine; no schema change).

🤖 Generated with Claude Code

Closes the two zero-scoring eval categories (typo, semantic) without
touching the four already at 0.91–1.00. Two pure-Postgres, recall-only
branches added to the free-text path (search/expand.ts + compile.ts):

- Fuzzy (typo): significant query tokens match via pg_trgm word_similarity
  over the trigram-indexed subject/sender/recipient columns from 0007,
  driven through OPERATOR(extensions.<%) with pg_trgm.word_similarity_threshold
  set to 0.4 per-statement (SET LOCAL) inside the read-only transaction.
- Concept (semantic): a curated, general email/business concept thesaurus
  widens the tsquery (primary || synonyms), so an intent word retrieves mail
  that never says it literally (vacation -> travel). Deterministic,
  zero-dependency Tier-1.5; Phase-2 embeddings remain the durable answer.

Safety: an is_primary flag (any exact lexical hit) is the leading ORDER BY
key, so every exact match ranks strictly above any fuzzy/concept-only match.
The recall branches can add results a keyword search misses but can never
reorder the ones it already gets right — zero regression by construction.

Eval (pnpm eval:search), before -> after:
  headline nDCG@10 0.731 -> 0.953, recall@10 0.724 -> 0.953
  typo     0.00 -> 0.65    semantic 0.00 -> 0.98
  lexical  0.91 -> 0.995   operator/ranking/phrase/email-intent 1.00 (held)

Verified: pnpm typecheck, pnpm test (182 passed incl. 12 new expand unit
tests), pnpm build, pnpm test:db:live (118/118 incl. the search-quality
quality gate against the real engine).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 19, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
supamail Ready Ready Preview, Comment Jun 19, 2026 4:38pm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant