perf(fused_group): SYM direct-count fires without WHERE on top-K count#237
Merged
Conversation
fp_try_direct_count1's SYM slot-array branch in fused_group.c
required `pred_key_ne_zero` (a WHERE-side `Col != 0` predicate that
filters the empty sym up front). Queries without a WHERE clause
(ClickBench q33 — `(select {URL: URL c: (count URL) from: hits
by: URL desc: c take: 10})`) fall through to `if (ctx->kt == RAY_SYM)
return NULL`, then the dispatcher lands on `fp_par_fn` (linear-probe
HT, single shard per worker). On q33 with 2.6 M distinct URLs the
shard is ~10 MB per worker, every probe an L3 miss; 22 % of total
cycles spent there.
Accept the no-WHERE case too: count every row including the empty
sym at slot 0. The downstream top-K heap ranks slot 0 against the
rest, so a heavy empty sym (q22 / q21 shape) ends up in the result
legitimately, and a sparse empty sym just disappears below keep_min.
Split the row scan into two specialisations (pred_key_ne_zero vs
no-WHERE) so the hot loop has no per-row branch.
ClickBench 10M:
q33 ~397 → ~25 ms (-94%, -372ms)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The SYM slot-array fast path in fp_try_direct_count1 required a
Col != 0WHERE predicate to fire — it skipped any query withouta WHERE clause and fell through to the generic linear-probe HT.
Relax the gate: when there is no WHERE, every row participates, so
the slot-array count is correct as-is. Inner loop branches on the
WHERE presence to either filter the empty-sym slot (pred path) or
count every row (no-WHERE path).