Skip to content

fix: ArrayInsert evaluation for null source arrays#4726

Open
peterxcli wants to merge 4 commits into
apache:mainfrom
peterxcli:handle-null-array-insert
Open

fix: ArrayInsert evaluation for null source arrays#4726
peterxcli wants to merge 4 commits into
apache:mainfrom
peterxcli:handle-null-array-insert

Conversation

@peterxcli

Copy link
Copy Markdown
Member

Which issue does this PR close?

No issue for this, relate to: #4716

Rationale for this change

Comet previously evaluated all ArrayInsert children for the full batch. This differed from Spark, which evaluates arguments left-to-right and does not evaluate the position or inserted item when the source array is NULL.

This caused expressions such as element_at(array(1), 0) to be evaluated even though Spark would short-circuit and return NULL for the row.

This patch uses DataFusion’s evaluate_selection in the native ArrayInsert implementation to skip evaluating pos and item for rows where the source array is NULL. It also adds regression coverage for both array_insert and array_prepend, since Spark rewrites array_prepend to array_insert(..., 1, ...).

What changes are included in this PR?

argument parsing logic in array_insert batch execution.

How are these changes tested?

  • ./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite array_prepend" -Dtest=none
  • ./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite array_insert" -Dtest=none

@peterxcli peterxcli changed the title Fix ArrayInsert evaluation for null source arrays fix: ArrayInsert evaluation for null source arrays Jun 25, 2026
@peterxcli peterxcli marked this pull request as draft June 25, 2026 16:15
@peterxcli

Copy link
Copy Markdown
Member Author

sorry let me fix the test error.

@peterxcli peterxcli marked this pull request as ready for review June 25, 2026 16:27
@peterxcli peterxcli marked this pull request as draft June 25, 2026 16:31
@peterxcli peterxcli marked this pull request as ready for review June 25, 2026 16:36
@andygrove

Copy link
Copy Markdown
Member

Nice fix. The short-circuit logic matches Spark's ArrayInsert.eval exactly, and evaluating src before pos also lines up error precedence with Spark when more than one argument can throw.

One suggestion on test coverage. The new test_array_insert_short_circuit case nicely exercises skipping pos (and item) when src is NULL. There's a second, distinct path that the new && pos_value.is_valid(row) guard on evaluate_item protects: src non-null, pos NULL, and an item that would throw if evaluated. Spark short-circuits item there too (the inner if (value2 != null) check), so it returns NULL rather than raising. It might be worth pinning that down with something like:

statement
INSERT INTO test_array_insert_short_circuit VALUES
  (NULL, 0),
  (array(1), 1),
  (array(1), 2)

query
SELECT array_insert(arr, CAST(NULL AS INT), element_at(array(1), idx))
FROM test_array_insert_short_circuit

Here pos is NULL for every row, so item = element_at(array(1), idx) should never be evaluated, including the idx = 0 and idx = 2 rows that would otherwise throw. Without the pos validity check on evaluate_item this would error, so it guards that branch directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants