perf(electric-db-collection): compile array-column membership to @> (GIN-eligible)#1583
perf(electric-db-collection): compile array-column membership to @> (GIN-eligible)#1583viktor89 wants to merge 3 commits into
Conversation
…GIN-eligible) inArray(value, arrayColumn) compiled to `value = ANY(arrayColumn)`, which Postgres cannot satisfy with a GIN index. The equivalent containment form `arrayColumn @> ARRAY[value]` can, so the planner index-seeks instead of scanning — a large win for low-cardinality values where ORDER BY ... LIMIT otherwise walks the whole table. Only the array-*column* case changes (RHS is a column ref); membership in a literal value list (inArray(column, [a, b])) still compiles to = ANY.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThe SQL compiler now emits ChangesArray column containment optimization
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint install timed out. The project may have too many dependencies for the sandbox. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/electric-db-collection/tests/sql-compiler.test.ts`:
- Around line 159-189: Add tests to cover corner cases for the new in operator
behavior in the sql-compiler.test.ts suite: add cases for empty-array (e.g.,
val([]) against ref status or roles), single-element-array (val([x]) and ARRAY
with single value) and null/undefined operands (val(null), val(undefined), and
ref that can be null/undefined) using compileSQL and func('in', [...]) to assert
the compiled where clause and params match the intended behavior; include both
permutations where the literal/array is the left operand and where the column
ref is the left operand (e.g., func('in', [ref('status'), val([])]), func('in',
[val('admin'), ref('roles')]) etc.) so the tests validate EMPTY, single-element,
and null/undefined handling for both "= ANY" and "@> ARRAY" code paths.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2a9cbd23-926b-4f0b-bf80-e1a1bec0aba2
📒 Files selected for processing (3)
.changeset/array-column-containment.mdpackages/electric-db-collection/src/sql-compiler.tspackages/electric-db-collection/tests/sql-compiler.test.ts
Address review: add empty-array, single-element, and null/undefined cases. `in` now goes through the same null-operand guard as the other comparison operators, so inArray(null/undefined, ...) throws instead of emitting an unbound parameter.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/electric-db-collection/src/sql-compiler.ts (1)
359-367:⚠️ Potential issue | 🟠 MajorTighten/guard
inArray(value, arrayColumn)sovaluecan’t be array-valued whenarrayColumnis a columnref
inArrayis declared asinArray(value: ExpressionLike, array: ExpressionLike)and just constructsFunc('in', ...), so the public types don’t prevent passing an arrayval(...)as the first argument. Inpackages/electric-db-collection/src/sql-compiler.ts, theinbranch forargs[1]?.type === 'ref'emitsrhs @> ARRAY[${lhs}], which wraps an array-valuedlhsinto a nested/single-element array and can break the intended “contains these elements” semantics. Add a type refinement (scalar-onlyvaluewhenarrayis a column ref) or a runtime check that rejects an array-valuedargs[0]in this scenario.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/electric-db-collection/src/sql-compiler.ts` around lines 359 - 367, The `inArray(value, array)` public API allows an array-valued `value` but the SQL emitter in sql-compiler.ts (the `if (name === 'in')` branch) assumes a scalar left-hand side when `args[1]?.type === 'ref'` and emits `${rhs} @> ARRAY[${lhs}]`, which breaks semantics for array-valued `lhs`; update the `in` handling so it validates/refines the first arg: in the `in` branch (same place where `args[1]?.type === 'ref'` is checked) add a runtime guard that detects array-valued `args[0]` (e.g., `args[0].type === 'val'` with an array payload or any internal array marker) and either throw a clear error or coerce to the correct SQL form, or tighten the public `inArray` typing so callers cannot pass array `value` when `array` is a column ref; reference the inArray factory and the `name === 'in'` emission logic to implement the check and fail fast if an array-valued `args[0]` is used with a column `args[1]`.
🧹 Nitpick comments (1)
packages/electric-db-collection/tests/sql-compiler.test.ts (1)
206-216: 💤 Low valueConsider testing null/undefined in the RHS position for completeness.
The current tests verify that
func('in', [val(null), ref('roles')])andfunc('in', [val(undefined), ref('roles')])throw. The validation logic insql-compiler.ts(lines 181-183) usesfindIndexto catch null/undefined in any argument position, sofunc('in', [ref('status'), val(null)])andfunc('in', [ref('status'), val(undefined)])will also throw. Adding explicit tests for those cases would provide symmetric coverage and guard against future refactoring that might break the any-position behavior.Optional additional test cases
+ it(`should throw for null in RHS position`, () => { + expect(() => + compileSQL({ where: func(`in`, [ref(`status`), val(null)]) }), + ).toThrow(`Cannot use null/undefined value with 'in' operator`) + }) + + it(`should throw for undefined in RHS position`, () => { + expect(() => + compileSQL({ where: func(`in`, [ref(`status`), val(undefined)]) }), + ).toThrow(`Cannot use null/undefined value with 'in' operator`) + })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/electric-db-collection/tests/sql-compiler.test.ts` around lines 206 - 216, Tests only check null/undefined in the LHS of func('in') but the compiler's validation (compileSQL) rejects null/undefined in any argument position; add symmetric tests asserting compileSQL throws for func('in', [ref('status'), val(null)]) and func('in', [ref('status'), val(undefined)]) so that compileSQL's behavior is covered for RHS operands as well.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@packages/electric-db-collection/src/sql-compiler.ts`:
- Around line 359-367: The `inArray(value, array)` public API allows an
array-valued `value` but the SQL emitter in sql-compiler.ts (the `if (name ===
'in')` branch) assumes a scalar left-hand side when `args[1]?.type === 'ref'`
and emits `${rhs} @> ARRAY[${lhs}]`, which breaks semantics for array-valued
`lhs`; update the `in` handling so it validates/refines the first arg: in the
`in` branch (same place where `args[1]?.type === 'ref'` is checked) add a
runtime guard that detects array-valued `args[0]` (e.g., `args[0].type ===
'val'` with an array payload or any internal array marker) and either throw a
clear error or coerce to the correct SQL form, or tighten the public `inArray`
typing so callers cannot pass array `value` when `array` is a column ref;
reference the inArray factory and the `name === 'in'` emission logic to
implement the check and fail fast if an array-valued `args[0]` is used with a
column `args[1]`.
---
Nitpick comments:
In `@packages/electric-db-collection/tests/sql-compiler.test.ts`:
- Around line 206-216: Tests only check null/undefined in the LHS of func('in')
but the compiler's validation (compileSQL) rejects null/undefined in any
argument position; add symmetric tests asserting compileSQL throws for
func('in', [ref('status'), val(null)]) and func('in', [ref('status'),
val(undefined)]) so that compileSQL's behavior is covered for RHS operands as
well.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a82cf245-4531-485f-b7bd-89020fc496f7
📒 Files selected for processing (2)
packages/electric-db-collection/src/sql-compiler.tspackages/electric-db-collection/tests/sql-compiler.test.ts
… against an array column The `arrayColumn @> ARRAY[value]` form wraps a single scalar; passing an array value would nest into `ARRAY[<array>]` and silently change the membership semantics. Throw a clear error instead, and cover it with a test.
perf(electric-db-collection): compile array-column membership to
@>(GIN-eligible)Summary
inArray(value, arrayColumn)currently compiles tovalue = ANY(arrayColumn). That's correct, but Postgres cannot use a GIN index forscalar = ANY(array_column)— only for the containment operatorarray_column @> ARRAY[scalar]. The two expressions are logically equivalent, so this PR emits the containment form when the array operand is a column, letting the planner index-seek instead of scanning.Membership in a literal value list (
inArray(column, [a, b, c])) is unchanged — it still compiles to= ANY, which is correct there.Why it matters
For a low-cardinality value in an array column, the
= ANYform is pathological in the commonORDER BY ... LIMIT Nshape: the planner walks an ordering index and filters row-by-row, and because few rows match, it can never fill theLIMIT— so it scans the entire table.Concrete case from a ~2M-row table with a
tags text[]column and a GIN index on it:A tag matching 7 of 2,000,000 rows went from a full-table scan to an index seek. Dense values (e.g. a tag on 25% of rows) are unaffected — the planner still picks the ordering-index walk, now adaptively, because
@>exposes the GIN option rather than removing it.Change
In
compileFunction'sinbranch, when the RHS is a column reference (args[1].type === 'ref'), emit${rhs} @> ARRAY[${lhs}]; otherwise keep${lhs} = ANY(${rhs}).The two shapes are distinguishable without column type info:
inArray(value, arrayColumn)→args[1]is aref→ containment.inArray(column, [literal, list])→args[1]is aval→= ANY.Tests
Added an
in operatordescribe block insql-compiler.test.tscovering: literal value list →= ANY, array column →@>, and a mixed compound (@>AND= ANY) to confirm both paths coexist with correct param indices.Notes
inArraycallers benefit transparently.Summary by CodeRabbit
Performance
Tests
Bug Fix
Documentation