Skip to content

[WIP][SPARK-57512][SQL] Allow evaluable RuntimeReplaceable to survive into the physical plan#56575

Open
cloud-fan wants to merge 2 commits into
apache:masterfrom
cloud-fan:rr-selfeval-validation
Open

[WIP][SPARK-57512][SQL] Allow evaluable RuntimeReplaceable to survive into the physical plan#56575
cloud-fan wants to merge 2 commits into
apache:masterfrom
cloud-fan:rr-selfeval-validation

Conversation

@cloud-fan

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Today every RuntimeReplaceable is rewritten into its replacement by ReplaceExpressions in the logical optimizer, so it never reaches the physical plan. This PR lets an evaluable RuntimeReplaceable survive into the physical plan:

  • ReplaceExpressions now rewrites early only when the fully-expanded replacement contains an Unevaluable expression (e.g. With, which depends on the logical-phase RewriteWithExpression rule); RuntimeReplaceableAggregate is still always rewritten. Everything else survives.
  • A new physical-preparation rule MaterializeRuntimeReplaceable rewrites the survivors into their replacement after columnar/native conversion and before CollapseCodegenStages, so Spark whole-stage codegen never sees a RuntimeReplaceable and per-operator metrics stay intact.
  • RuntimeReplaceable.eval/doGenCode delegate to replacement as a backstop for interpreted and non-whole-stage-codegen paths.

Why are the changes needed?

Keeping the semantic expression (e.g. right(a, b)) in the plan lets optimizer rules introduce RuntimeReplaceables freely, keeps the high-level expression visible in the optimized logical plan, and lets a native engine match the high-level expression directly instead of reverse-engineering its lowered form.

Does this PR introduce any user-facing change?

EXPLAIN of the optimized logical plan now shows the original expression (e.g. nvl) rather than its replacement (e.g. coalesce) for the surviving cases. The physical plan and query results are unchanged.

How was this patch tested?

StringFunctionsSuite and JsonFunctionsSuite pass locally; relying on full CI to surface any plan-shape or execution regressions from RuntimeReplaceable surviving the optimizer.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.8)

This pull request and its description were written by Isaac.

…l plan

Experiment to validate letting RuntimeReplaceable reach the physical plan
(so a native engine can match the semantic expression), instead of always
replacing it in the logical optimizer.

- ReplaceExpressions: replace early only when the fully-expanded replacement
  contains an Unevaluable (e.g. With, which needs RewriteWithExpression in the
  logical phase); aggregates always early. Everything else survives.
- MaterializeRuntimeReplaceable: physical-prep rule that materializes the
  survivors into their replacement after columnar/native conversion and before
  CollapseCodegenStages, so Spark codegen never sees a RuntimeReplaceable while
  a native engine still sees the origin. Metrics stay correct (real prep node).
- RuntimeReplaceable.eval/doGenCode delegate to replacement as a backstop.

Pushed to run full CI and surface any plan-shape / execution regressions from
RuntimeReplaceable surviving the optimizer.

Co-authored-by: Isaac
… harden FoldablePropagation

Derive `RuntimeReplaceable.deterministic`/`foldable` from `replacement` (not
`children`), so the survival decision in `ReplaceExpressions` is accurate:
non-deterministic replacements (e.g. the `Rand` inside `uniform`) are rewritten
early instead of surviving, avoiding "Nondeterministic should be initialized
before eval".

A foldable `RuntimeReplaceable` (e.g. `collation(c1)`) is foldable yet still
references its children, which broke `FoldablePropagation`. Make
`FoldablePropagation` propagate only literals; `ConstantFolding` materializes
foldable expressions into literals in the same batch, after which they propagate
safely.

Also relax `AggregateFunction.foldable` from `final` so
`RuntimeReplaceableAggregate` can inherit the trait's `foldable` (its
replacement is itself an aggregate, so effective foldability stays false), drop
`Uniform`'s now-redundant `deterministic` override, and regenerate the affected
Connect proto golden files.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant