Spark: Implement variant extraction pushdown for shredded VARIANT columns#16715
Open
qlong wants to merge 1 commit into
Open
Spark: Implement variant extraction pushdown for shredded VARIANT columns#16715qlong wants to merge 1 commit into
qlong wants to merge 1 commit into
Conversation
9cb1c91 to
52dcc63
Compare
Author
|
@rdblue @steveloughran @nssalian PTAL when you get a chance. |
Author
|
Spark side fix for column pruning when variant is pushed down. apache/spark#56556 |
52dcc63 to
fd30c5c
Compare
…umns - Add SparkVariantExtractionScanBuilder implementing SupportsPushDownVariantExtractions so Spark can push variant_get paths from Filter/Project nodes into Iceberg scans. - Gate behind spark.sql.iceberg.variant-extraction-push-down.enabled (default on). - Use an all-or-nothing batch policy: decline the entire batch if any extraction has an unsupported path, unsupported target type, references a non-variant column, or is a full-variant slot (expectedDataType = VariantType, path $). - Avoid partial scan rewrites that break multi-variant tables and plans where variant_get above join/aggregate barriers still references the original column. - Override readSchema() on batch query scans to expose annotated extraction structs to executors. - Add TestVariantShreddingPushdown for DSv2 plan shape and query correctness. - Requires the parquet-io selective reader PR for end-to-end shredded column reads. issue: apache#16448
fd30c5c to
dd29882
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change
This PR is part of the work to support variant extraction pushdown, the core change is to engineschema that now maps slots to paths in variant.
issue: #16448
Notes for reviewers
End to end testing
Requires #16714 for end-to-end testing. To try the full pushdown + selective read path without merging locally, use this branch:
https://github.com/qlong/iceberg/tree/variant-extraction-integration-test
Test Results
See performance improvements in #16714
Co-authored with Claude Sonnet 4.6