Skip to content

Spark: Implement variant extraction pushdown for shredded VARIANT columns#16715

Open
qlong wants to merge 1 commit into
apache:mainfrom
qlong:supports-pushdown-variant-extraction
Open

Spark: Implement variant extraction pushdown for shredded VARIANT columns#16715
qlong wants to merge 1 commit into
apache:mainfrom
qlong:supports-pushdown-variant-extraction

Conversation

@qlong

@qlong qlong commented Jun 8, 2026

Copy link
Copy Markdown

Change

This PR is part of the work to support variant extraction pushdown, the core change is to engineschema that now maps slots to paths in variant.

  • Add SparkVariantExtractionScanBuilder implementing SupportsPushDownVariantExtractions so Spark can push variant_get paths from Filter/Project nodes into Iceberg scans.
  • Gate behind new spark.sql.iceberg.variant-extraction-push-down.enabled (default on).
  • Use an all-or-nothing batch policy: decline the entire batch if any extraction has an unsupported path, unsupported target type, references a non-variant column, or is a full-variant slot (expectedDataType = VariantType, path $).
  • Avoid partial scan rewrites that break multi-variant tables and plans where variant_get above join/aggregate barriers still references the original column.
  • Override readSchema() on batch query scans to expose annotated extraction structs to executors.
  • Add TestVariantShreddingPushdown for DSv2 plan shape and query correctness.

issue: #16448

Notes for reviewers

End to end testing

Requires #16714 for end-to-end testing. To try the full pushdown + selective read path without merging locally, use this branch:

https://github.com/qlong/iceberg/tree/variant-extraction-integration-test

Test Results

See performance improvements in #16714

Co-authored with Claude Sonnet 4.6

@qlong qlong force-pushed the supports-pushdown-variant-extraction branch 5 times, most recently from 9cb1c91 to 52dcc63 Compare June 8, 2026 14:10
@qlong

qlong commented Jun 8, 2026

Copy link
Copy Markdown
Author

@rdblue @steveloughran @nssalian PTAL when you get a chance.

@qlong

qlong commented Jun 18, 2026

Copy link
Copy Markdown
Author

Spark side fix for column pruning when variant is pushed down. apache/spark#56556

@qlong qlong force-pushed the supports-pushdown-variant-extraction branch from 52dcc63 to fd30c5c Compare June 18, 2026 20:24
…umns

- Add SparkVariantExtractionScanBuilder implementing
  SupportsPushDownVariantExtractions so Spark can push variant_get paths
  from Filter/Project nodes into Iceberg scans.
- Gate behind spark.sql.iceberg.variant-extraction-push-down.enabled
  (default on).
- Use an all-or-nothing batch policy: decline the entire batch if any
  extraction has an unsupported path, unsupported target type,
  references a non-variant column, or is a full-variant slot
  (expectedDataType = VariantType, path $).
- Avoid partial scan rewrites that break multi-variant tables and plans
  where variant_get above join/aggregate barriers still references the
  original column.
- Override readSchema() on batch query scans to expose annotated
  extraction structs to executors.
- Add TestVariantShreddingPushdown for DSv2 plan shape and query
  correctness.
- Requires the parquet-io selective reader PR for end-to-end shredded
  column reads.

issue: apache#16448
@qlong qlong force-pushed the supports-pushdown-variant-extraction branch from fd30c5c to dd29882 Compare June 18, 2026 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant