Skip to content

feat(search): abort split warmup when a required term is missing#6511

Open
PSeitz-dd wants to merge 1 commit into
quickwit-oss:mainfrom
PSeitz:abort_early_no_terms
Open

feat(search): abort split warmup when a required term is missing#6511
PSeitz-dd wants to merge 1 commit into
quickwit-oss:mainfrom
PSeitz:abort_early_no_terms

Conversation

@PSeitz-dd

Copy link
Copy Markdown
Contributor

Abort split warmup when a required term is missing to not fully download fast fields.

A leaf search warms up a split by downloading term dictionaries, posting lists, fast fields and field norms before running the query. When the query has a term that must match (a single-term clause reachable through must/filter only) and that term's posting list is empty in the split, the query provably matches nothing there — so the rest of the warmup is wasted work. Moreover it downloads fastfields and may put unnecessary pressure on the cache.

This extracts the set of required terms from the query AST and, during warmup, cancels the remaining downloads (fast fields, field norms, other postings) as soon as a required term is found missing, returning an empty result for the split. warm_postings already reports term existence, which is used here.

  • quickwit-query: required_terms extraction + build_tantivy_query_and_required_terms.
  • quickwit-doc-mapper: WarmupInfo::required_terms.
  • quickwit-search: race warmup against a CancellationToken fired by warm_up_terms; abort returns a counted empty response that still reports its (aborted) download volume; new pruned_empty_term metric.

Benchmarks

Comparing: main with ff cache (baseline), main, abort

list_7d_low_hits is a single selective term.

We can see the downloaded data is reduced, the number of get requests is unchanged (requests still run in parallel). CPU time is reduced due to reduced IO.

Screenshot 2026-06-15 at 11 46 26 Screenshot 2026-06-15 at 11 45 14 Screenshot 2026-06-15 at 11 48 12

@PSeitz-dd PSeitz-dd requested review from a team as code owners June 15, 2026 09:49

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 87be0f56e7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread quickwit/quickwit-search/src/leaf.rs Outdated
Comment thread quickwit/quickwit-search/src/leaf.rs
Comment thread quickwit/quickwit-search/src/leaf.rs
A leaf search warms up a split by downloading term dictionaries, posting
lists, fast fields and field norms before running the query. When the
query has a term that *must* match (a single-term clause reachable through
`must`/`filter` only) and that term's posting list is empty in the split,
the query provably matches nothing there — so the rest of the warmup is
wasted work.

This extracts the set of required terms from the query AST and, during
warmup, cancels the remaining downloads (fast fields, field norms, other
postings) as soon as a required term is found missing, returning an empty
result for the split. `warm_postings` already reports term existence, so
no extra lookups are added; the existence signal is simply no longer
discarded.

- quickwit-query: `required_terms` extraction + `build_tantivy_query_and_required_terms`.
- quickwit-doc-mapper: `WarmupInfo::required_terms`.
- quickwit-search: race warmup against a `CancellationToken` fired by
  `warm_up_terms`; abort returns a counted empty response that still
  reports its (aborted) download volume; new `pruned_empty_term` metric.

The optimization is gated on single-segment splits (sound per split) and
on exact `TermQuery` leaves on indexed fields; everything else is treated
conservatively, so it never discards a query that could match.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@PSeitz PSeitz force-pushed the abort_early_no_terms branch from 87be0f5 to e489247 Compare June 15, 2026 10:49
@PSeitz-dd

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep it up!

Reviewed commit: e4892475be

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@PSeitz-dd PSeitz-dd requested a review from trinity-1686a June 15, 2026 10:58
)
.instrument(debug_span!("warm_up_automatons"));

tokio::try_join!(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a follow-up, i think it'd be interesting to record how many splits where cancelled vs not, and if that ratio goes over something small (10% maybe?), run warm_up_terms_future first, and then the rest (if warm_up_terms_future didn't cancel everything already)

@trinity-1686a

Copy link
Copy Markdown
Contributor

how is Object Storage Download measured? If it's self reported, i think it misrepresent the result: we record s3 download at the end of the call if it succeeded. if it died partway, i think we report 0 and not whatever was actually downloaded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants