[WIP][SPARK-57512][SQL] Allow evaluable RuntimeReplaceable to survive into the physical plan#56575
Open
cloud-fan wants to merge 2 commits into
Open
[WIP][SPARK-57512][SQL] Allow evaluable RuntimeReplaceable to survive into the physical plan#56575cloud-fan wants to merge 2 commits into
cloud-fan wants to merge 2 commits into
Conversation
…l plan Experiment to validate letting RuntimeReplaceable reach the physical plan (so a native engine can match the semantic expression), instead of always replacing it in the logical optimizer. - ReplaceExpressions: replace early only when the fully-expanded replacement contains an Unevaluable (e.g. With, which needs RewriteWithExpression in the logical phase); aggregates always early. Everything else survives. - MaterializeRuntimeReplaceable: physical-prep rule that materializes the survivors into their replacement after columnar/native conversion and before CollapseCodegenStages, so Spark codegen never sees a RuntimeReplaceable while a native engine still sees the origin. Metrics stay correct (real prep node). - RuntimeReplaceable.eval/doGenCode delegate to replacement as a backstop. Pushed to run full CI and surface any plan-shape / execution regressions from RuntimeReplaceable surviving the optimizer. Co-authored-by: Isaac
… harden FoldablePropagation Derive `RuntimeReplaceable.deterministic`/`foldable` from `replacement` (not `children`), so the survival decision in `ReplaceExpressions` is accurate: non-deterministic replacements (e.g. the `Rand` inside `uniform`) are rewritten early instead of surviving, avoiding "Nondeterministic should be initialized before eval". A foldable `RuntimeReplaceable` (e.g. `collation(c1)`) is foldable yet still references its children, which broke `FoldablePropagation`. Make `FoldablePropagation` propagate only literals; `ConstantFolding` materializes foldable expressions into literals in the same batch, after which they propagate safely. Also relax `AggregateFunction.foldable` from `final` so `RuntimeReplaceableAggregate` can inherit the trait's `foldable` (its replacement is itself an aggregate, so effective foldability stays false), drop `Uniform`'s now-redundant `deterministic` override, and regenerate the affected Connect proto golden files. Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Today every
RuntimeReplaceableis rewritten into itsreplacementbyReplaceExpressionsin the logical optimizer, so it never reaches the physical plan. This PR lets an evaluableRuntimeReplaceablesurvive into the physical plan:ReplaceExpressionsnow rewrites early only when the fully-expanded replacement contains anUnevaluableexpression (e.g.With, which depends on the logical-phaseRewriteWithExpressionrule);RuntimeReplaceableAggregateis still always rewritten. Everything else survives.MaterializeRuntimeReplaceablerewrites the survivors into their replacement after columnar/native conversion and beforeCollapseCodegenStages, so Spark whole-stage codegen never sees aRuntimeReplaceableand per-operator metrics stay intact.RuntimeReplaceable.eval/doGenCodedelegate toreplacementas a backstop for interpreted and non-whole-stage-codegen paths.Why are the changes needed?
Keeping the semantic expression (e.g.
right(a, b)) in the plan lets optimizer rules introduceRuntimeReplaceables freely, keeps the high-level expression visible in the optimized logical plan, and lets a native engine match the high-level expression directly instead of reverse-engineering its lowered form.Does this PR introduce any user-facing change?
EXPLAINof the optimized logical plan now shows the original expression (e.g.nvl) rather than its replacement (e.g.coalesce) for the surviving cases. The physical plan and query results are unchanged.How was this patch tested?
StringFunctionsSuiteandJsonFunctionsSuitepass locally; relying on full CI to surface any plan-shape or execution regressions fromRuntimeReplaceablesurviving the optimizer.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.8)
This pull request and its description were written by Isaac.