fix(plan): dedup update from source matches#24932
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
XuPeng-SH
left a comment
There was a problem hiding this comment.
I found one blocking correctness issue in the new UPDATE ... FROM dedup plan.
pkg/sql/plan/bind_update.go:1029-1041 wraps each updated column in its own any_value(...), and pkg/sql/colexec/aggexec/any2.go:44-53 implements any_value as “first non-NULL seen for this column”. With duplicate source matches, that means different updated columns can be taken from different source rows whenever NULLs are distributed differently across those rows.
Example: if the same target row matches (new_a = NULL, new_b = 'x') and (new_a = 7, new_b = NULL), the dedup can materialize (new_a = 7, new_b = 'x'), even though no matching source row produced that combination. PostgreSQL-style UPDATE ... FROM is allowed to pick an arbitrary matching row, but it still has to pick one row consistently, not synthesize a cross-row tuple. Please dedup on a whole-source-row choice rather than per-column any_value aggregation.
aunjgr
left a comment
There was a problem hiding this comment.
XuPeng-SH's concern is already addressed in the follow-up commits. The PR now uses row_number() + filter for whole-row dedup, not per-column any_value(). Tests assert any_value is absent. The BVT case whole_row_t/whole_row_s covers the exact cross-row synthesis scenario described in the review.
XuPeng-SH
left a comment
There was a problem hiding this comment.
I think there is still a real correctness hole here: the PR fixes whole-row dedup only in the new bindUpdate path, but tables that still fall back to buildTableUpdate can still synthesize a row that never existed in any matched source row.
The path is:
- FK-constrained targets still return
UnsupportedDMLfromdml_context.goand are routed tobuildTableUpdate(pkg/sql/plan/dml_context.go:223-224,pkg/sql/plan/build.go:245-248). - For any PostgreSQL-style
UPDATE ... FROM, that fallback path forcesneedAggFilter(pkg/sql/plan/build_constraint_util.go:102-108). needAggFilteris still implemented asGROUP BY old target cols + any_value(each updated column)(pkg/sql/plan/build_dml_util.go:2875-2949).
So with duplicate source rows like (new_a = NULL, new_b = 'from-null-a') and (new_a = 7, new_b = NULL), the fallback path can still write (7, 'from-null-a'), even though no joined source row ever had that whole-row combination. This PR's new whole-row protection in appendUpdateFromDedupNode() does not reach that path.
I think the fallback planner needs the same row-level dedup strategy as the new path, not per-column any_value().
What type of PR is this?
Which issue(s) this PR fixes:
issue #23137
What this PR does / why we need it:
This PR fixes the PostgreSQL-style
UPDATE ... SET ... FROM ... WHERE ...path when duplicate source rows match the same target row on tables without FK constraints.The new UPDATE path previously fed duplicate joined rows directly into the update pipeline, which could materialize duplicate primary-key rows for one target row. This change adds a planner-side dedup step for
UPDATE ... FROM: group by the target row's original columns and useany_value()for updated columns, matching the existing fallback path behavior.Test coverage includes:
AGG + any_valuededup.Tested with:
git diff --checkgo test ./pkg/sql/plan -run TestUpdatePgStyleFromDedupsDuplicateSourceMatchesOnNewPath -count=1go test ./pkg/sql/plan -count=1mo-tester update_pg_style_from.sql(108/108passed)makemake static-check(0 issues)