fix(index): defer scalar index parser selection until operator context is known#7509
Open
ztorchan wants to merge 1 commit into
Open
fix(index): defer scalar index parser selection until operator context is known#7509ztorchan wants to merge 1 commit into
ztorchan wants to merge 1 commit into
Conversation
Contributor
Author
|
hi @westonpace, please take a look at this pr |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a single column carries more than one scalar index (for example a bloom filter and a btree), some queries that should be index-accelerated silently degrade to a full scan.
Concretely, with both a bloom filter and a btree on column
x, the queryx > 7is not accelerated by the btree even though the btree can serve the range. Whichever parser happens to be registered first wins, regardless of whether it can actually handle the query shape.Fixes #7091.
Root cause
Matching a filter leaf to an index involves two distinct decisions:
Expr(incl. JSON path) — but not the operatorMultiQueryParser::select(introduced in #7072 to route JSON queries to the correct path) answered both withis_valid_referencealone, committing to the first parser that accepted the reference. Becauseis_valid_referencecannot see the operator, it could not tell that the bloom filter is unable to serve a range:
visit_comparisoncallsmaybe_indexed_column(x)— onlyColumn("x")is visible, the>lives on the parent node.selectasks each parser'sis_valid_reference; both bloom and btree accept a bare column.selectreturns the first one (bloom).bloom.visit_comparison(x, 7, Gt)returnsNone(equality only).This was effectively a regression from #7072: before that change the column's whole
MultiQueryParserwas returned and itsvisit_*find_mapfallback naturally skipped bloom and picked the btree. #7072 fixed JSON path routing butbypassed that capability fallback.
Fix
Keep reference validation and capability matching as separate phases:
MultiQueryParser::selectnow collects every parser whoseis_valid_referenceaccepts the expression and returns a borrowedMultiQueryParserViewover that subset (plus the referenced data type).MultiQueryParserViewbehaves exactly like its parentMultiQueryParser— eachvisit_*fans out to its parsers with "first match wins" semantics — but operates on a borrowed subset instead of owning the parsers. Thecapability decision is therefore deferred to the
visit_*calls, where the operator is finally known.This unifies both scenarios:
selectvisit_*find_mappicks by capability (#7091 fixed)The change is intentionally local:
maybe_indexed_columnswaps its returned parser type forMultiQueryParserView, and none of the ~7 downstream visitors (visit_comparison,visit_between,visit_in_list,visit_is_bool,visit_is_null,maybe_range,visit_scalar_fn,visit_like_expr) needed to change — the fan-out/fallback logic stays in one place.A borrowed view (rather than returning an owned
MultiQueryParser) is required becauseselectonly has&self, the parsers are stored asBox<dyn ScalarQueryParser>(notClone), and the dispatch table is shared across all leaves of a filter tree within a query, so it must not be mutated or pruned in place.Tests
Added
test_multi_index_defers_to_operator_capability, which registers a bloom filter first (so the pre-fix "first valid reference wins" behavior would route everything to it) and a btree second on the same column, then asserts:size > 7is served by the btree as aSargableQuery::Range(the bug scenario — previously a full scan).size = 7is still served by the bloom filter asBloomFilterQuery::Equals(the fallback does not regress the equality shape the bloom filter supports).The existing
test_multi_json_indices_route_by_pathcontinues to pass, confirming JSON path routing from #7072 is unaffected.