Skip to content

perf(fused_group): SYM direct-count fires without WHERE on top-K count#237

Merged
singaraiona merged 2 commits into
masterfrom
perf/q33-sym-direct-no-where
Jun 10, 2026
Merged

perf(fused_group): SYM direct-count fires without WHERE on top-K count#237
singaraiona merged 2 commits into
masterfrom
perf/q33-sym-direct-no-where

Conversation

@ser-vasilich

Copy link
Copy Markdown
Collaborator

The SYM slot-array fast path in fp_try_direct_count1 required a
Col != 0 WHERE predicate to fire — it skipped any query without
a WHERE clause and fell through to the generic linear-probe HT.

Relax the gate: when there is no WHERE, every row participates, so
the slot-array count is correct as-is. Inner loop branches on the
WHERE presence to either filter the empty-sym slot (pred path) or
count every row (no-WHERE path).

fp_try_direct_count1's SYM slot-array branch in fused_group.c
required `pred_key_ne_zero` (a WHERE-side `Col != 0` predicate that
filters the empty sym up front).  Queries without a WHERE clause
(ClickBench q33 — `(select {URL: URL c: (count URL) from: hits
by: URL desc: c take: 10})`) fall through to `if (ctx->kt == RAY_SYM)
return NULL`, then the dispatcher lands on `fp_par_fn` (linear-probe
HT, single shard per worker).  On q33 with 2.6 M distinct URLs the
shard is ~10 MB per worker, every probe an L3 miss; 22 % of total
cycles spent there.

Accept the no-WHERE case too: count every row including the empty
sym at slot 0.  The downstream top-K heap ranks slot 0 against the
rest, so a heavy empty sym (q22 / q21 shape) ends up in the result
legitimately, and a sparse empty sym just disappears below keep_min.

Split the row scan into two specialisations (pred_key_ne_zero vs
no-WHERE) so the hot loop has no per-row branch.

ClickBench 10M:

  q33  ~397 → ~25 ms   (-94%, -372ms)
@singaraiona singaraiona merged commit 4e3641c into master Jun 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants