Skip to content

feat: add Spark-compatible arrays_zip function#22473

Open
CuteChuanChuan wants to merge 1 commit into
apache:mainfrom
CuteChuanChuan:raymond/20888-spark-arrays-zip
Open

feat: add Spark-compatible arrays_zip function#22473
CuteChuanChuan wants to merge 1 commit into
apache:mainfrom
CuteChuanChuan:raymond/20888-spark-arrays-zip

Conversation

@CuteChuanChuan
Copy link
Copy Markdown
Contributor

@CuteChuanChuan CuteChuanChuan commented May 23, 2026

Which issue does this PR close?

Closes #20888.

Rationale for this change

Spark's arrays_zip returns a list of structs whose fields are named with 0-based ordinals (0, 1, 2, ...), while DataFusion's arrays_zip uses 1-based ordinals (1, 2, 3, ...). To support Spark compatibility without altering DataFusion's native semantics, this PR adds a SparkArraysZip wrapper in the datafusion-spark crate.

What changes are included in this PR?

  1. Add SparkArraysZip that delegates to ArraysZip and renames the inner struct fields of the returned List<Struct<..>> to 0-based ordinals. Both the planning-time DataType and the execution-time ArrayRef are renamed.
  2. Add sqllogictest coverage, mirroring scenarios from Spark's DataFrameFunctionsSuite#"dataframe arrays_zip function".

Are these changes tested?

Yes — several sqllogictest cases under spark/array/arrays_zip.slt cover:

Are there any user-facing changes?

Yes — arrays_zip produces structs with 0-based field names instead of 1-based. No breaking changes to DataFusion's native arrays_zip.

- Add SparkArraysZip wrapper that delegates to ArraysZip and renames inner struct fields to 0-based ordinals to match Spark semantics
- Add sqllogictest cases mirroring Spark's DataFrameFunctionsSuite
@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) spark labels May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

spark: introduce Spark arrays_zip

1 participant