Skip to content

Claude merge 2#2612

Open
dimoffon wants to merge 2213 commits into
arenadata:adb-8.xfrom
dimoffon:claude-merge-2
Open

Claude merge 2#2612
dimoffon wants to merge 2213 commits into
arenadata:adb-8.xfrom
dimoffon:claude-merge-2

Conversation

@dimoffon

Copy link
Copy Markdown
Member

Here are some reminders before you submit the pull request

  • Add tests for the change
  • Document changes
  • Communicate in the mailing list if needed
  • Pass make installcheck
  • Review a PR in return to support the community

MasaoFujii and others added 30 commits June 18, 2021 17:57
Commit d75288f made WAL archiver process an auxiliary process.
An auxiliary process needs to handle barrier events but the commit
forgot to make archiver process do that.

Reported-by: Thomas Munro
Author: Fujii Masao
Reviewed-by: Thomas Munro
Discussion: https://postgr.es/m/CA+hUKGLah2w1pWKHonZP_+EQw69=q56AHYwCgEN8GDzsRG_Hgw@mail.gmail.com
The fast default code added in Release 11 omitted to check that the
table a fast default was being added to was a plain table. Thus one
could be added to a foreign table, which predicably blows up. Here we
perform that check.

In addition, on the back branches, since some of these might have
escaped into the wild, if we encounter a missing value for
an attribute of something other than a plain table we ignore it.

Fixes bug #17056

Backpatch to release 11,

Reviewed by: Andres Freund, Álvaro Herrera and Tom Lane
In the "simple Query" code path, it's fine for parse analysis or
execution of a utility statement to scribble on the statement's node
tree, since that'll just be thrown away afterwards.  However it's
not fine if the node tree is in the plan cache, as then it'd be
corrupted for subsequent executions.  Up to now we've dealt with
that by having individual utility-statement functions apply
copyObject() if they were going to modify the tree.  But that's
prone to errors of omission.  Bug #17053 from Charles Samborski
shows that CREATE/ALTER DOMAIN didn't get this memo, and can
crash if executed repeatedly from plan cache.

In the back branches, we'll just apply a narrow band-aid for that,
but in HEAD it seems prudent to have a more principled fix that
will close off the possibility of other similar bugs in future.
Hence, let's hoist the responsibility for doing copyObject up into
ProcessUtility from its children, thus ensuring that it happens for
all utility statement types.

Also, modify ProcessUtility's API so that its callers can tell it
whether a copy step is necessary.  It turns out that in all cases,
the immediate caller knows whether the node tree is transient, so
this doesn't involve a huge amount of code thrashing.  In this way,
while we lose a little bit in the execute-from-cache code path due
to sometimes copying node trees that wouldn't be mutated anyway,
we gain something in the simple-Query code path by not copying
throwaway node trees.  Statements that are complex enough to be
expensive to copy are almost certainly ones that would have to be
copied anyway, so the loss in the cache code path shouldn't be much.

(Note that this whole problem applies only to utility statements.
Optimizable statements don't have the issue because we long ago made
the executor treat Plan trees as read-only.  Perhaps someday we will
make utility statement execution act likewise, but I'm not holding
my breath.)

Discussion: https://postgr.es/m/931771.1623893989@sss.pgh.pa.us
Discussion: https://postgr.es/m/17053-3ca3f501bbc212b4@postgresql.org
Commit 547f04e caused pgbench to start printing its version number,
which seems like a fine idea, but it needs a bit more work:
* Print the server version number too, when different.
* Print the PG_VERSION string, not some reconstructed approximation.

This patch copies psql's well-tested code for the same purpose.

Discussion: https://postgr.es/m/1226654.1624036821@sss.pgh.pa.us
Ordinarily, a pg_policy.polroles array wouldn't list the same role
more than once; but CREATE POLICY does not prevent that.  If we
perform DROP OWNED BY on a role that is listed more than once,
RemoveRoleFromObjectPolicy either suffered an assertion failure
or encountered a tuple-updated-by-self error.  Rewrite it to cope
correctly with duplicate entries, and add a CommandCounterIncrement
call to prevent the other problem.

Per discussion, there's other cleanup that ought to happen here,
but this seems like the minimum essential fix.

Per bug #17062 from Alexander Lakhin.  It's been broken all along,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/17062-11f471ae3199ca23@postgresql.org
The code to signal a running walsender when its reserved WAL size grows
too large is completely uncovered before this commit; this adds coverage
for that case.

This test involves sending SIGSTOP to walsender and walreceiver and
running a checkpoint while advancing WAL, then sending SIGCONT.  There's
no precedent for this coding in Perl tests, and my reading of relevant
manpages says it's likely to fail on Windows.  Because of this, this
test is always skipped on that platform.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/202106102202.mjw4huiix7lo@alvherre.pgsql
Generalize the INDEX_CLEANUP VACUUM parameter (and the corresponding
reloption): make it into a ternary style boolean parameter.  It now
exposes a third option, "auto".  The "auto" option (which is now the
default) enables the "bypass index vacuuming" optimization added by
commit 1e55e7d.

"VACUUM (INDEX_CLEANUP TRUE)" is redefined to once again make VACUUM
simply do any required index vacuuming, regardless of how few dead
tuples are encountered during the first scan of the target heap relation
(unless there are exactly zero).  This gives users a way of opting out
of the "bypass index vacuuming" optimization, if for whatever reason
that proves necessary.  It is also expected to be used by PostgreSQL
developers as a testing option from time to time.

"VACUUM (INDEX_CLEANUP FALSE)" does the same thing as it always has: it
forcibly disables both index vacuuming and index cleanup.  It's not
expected to be used much in PostgreSQL 14.  The failsafe mechanism added
by commit 1e55e7d addresses the same problem in a simpler way.
INDEX_CLEANUP can now be thought of as a testing and compatibility
option.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-By: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/CAH2-WznrBoCST4_Gxh_G9hA8NzGUbeBGnOUC8FcXcrhqsv6OHQ@mail.gmail.com
Commit e7eea52 has introduced a new function
RelationGetIdentityKeyBitmap which omits to handle the case where there is
no replica identity index on a relation.

Author: Mark Dilger
Reviewed-by: Takamichi Osumi, Amit Kapila
Discussion: https://www.postgresql.org/message-id/4C99A862-69C8-431F-960A-81B1151F1B89@enterprisedb.com
We had a request to provide a way to test at compile time for the
availability of the new pipeline features.  More generally, it
seems like a good idea to provide a way to test via #ifdef for
all new libpq API features.  People have been using the version
from pg_config.h for that; but that's more likely to represent the
server version than the libpq version, in the increasingly-common
scenario where they're different.  It's safer if libpq-fe.h itself
is the source of truth about what features it offers.

Hence, establish a policy that starting in v14 we'll add a suitable
feature-is-present macro to libpq-fe.h when we add new API there.
(There doesn't seem to be much point in applying this policy
retroactively, but it's not too late for v14.)

Tom Lane and Alvaro Herrera, per suggestion from Boris Kolpackov.

Discussion: https://postgr.es/m/boris.20210617102439@codesynthesis.com
Buildfarm members ayu and tern have sometimes shown a different
plan than expected for this query.  I'd been unable to reproduce
that before today, but I finally realized what is happening.
If there is a concurrent open transaction (probably an autovacuum
run in the buildfarm, but this can also be arranged manually),
then the index entries for the rows removed by the DELETE a few
lines up are not killed promptly, causing a change in the planner's
estimate of the extremal value of ft2.c1, which moves the rowcount
estimate for "c1 > 1100" by enough to change the join plan from
nestloop to hash.

To fix, change the query condition to "c1 > 1000", causing the
hash plan to be preferred whether or not a concurrent open
transaction exists.  Since this UPDATE is tailored to be a no-op,
nothing else changes.

Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=ayu&dt=2021-06-09%2022%3A45%3A48
Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=ayu&dt=2021-06-13%2022%3A38%3A18
Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2021-06-20%2004%3A55%3A36
This reverts commit 0912698; the test case added there failed once
in circumstances that remain mysterious.  It seems better to remove the
test for now so that 14beta2 doesn't have random failures built in.
The failsafe can trigger when index processing is already disabled.
This can happen when VACUUM's INDEX_CLEANUP parameter is "off" and the
failsafe happens to trigger.  Remove assertions that assume that index
processing is directly tied to the failsafe.

Oversight in commit c242baa, which made it possible for the failsafe to
trigger in a two-pass strategy VACUUM that has yet to make its first
call to lazy_vacuum_all_indexes().
Code comments were claiming that verify_heapam() was checking
privileges on the relation it was operating on, but it didn't actually
do that.  Perhaps earlier versions of the patch did that, but now the
access is regulated by privileges on the function.  Remove the wrong
comments.
Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash: 70796ae860c444c764bb591c885f22cac1c168ec
linitial_node() fails in assert enabled builds if the given pointer is
not of the specified type.  Here the type is IntList.  The code thought
it should be expecting List, but it was wrong.

In the existing tests which run this code the initial list element is
always NIL.  Since linitial_node() allows NULL, we didn't trigger any
assert failures in the existing regression tests.

There is still some discussion as to whether we need a few more tests in
this area, but for now, since beta2 is looming, fix the bug first.

Bug: #17067
Discussion: https://postgr.es/m/17067-665d50fa321f79e0@postgresql.org
Reported-by: Yaoguang Chen
In dc7420c I (Andres) accidentally used
RelationIsAccessibleInLogicalDecoding() as the sole condition to use the
non-shared catalog horizon in GetOldestNonRemovableTransactionId(). That is
incorrect, as RelationIsAccessibleInLogicalDecoding() checks whether wal_level
is logical.

The correct check, as done e.g. in GlobalVisTestFor(), is to check
IsCatalogRelation() and RelationIsAccessibleInLogicalDecoding().

The observed misbehavior of this bug was that there could be an endless loop
in lazy_scan_prune(), because the horizons used in heap_page_prune() and the
individual tuple liveliness checks did not match. Likely there are other
potential consequences as well.

A later commit will unify the determination which horizon has to be used, and
add additional assertions to make it easier to catch a bug like this.

Reported-By: Justin Pryzby <pryzby@telsasoft.com>
Diagnosed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAEze2Wg32Y9+WJfw=aofkRx1ZRFt_Ev6bNPc4PSaz7PjSFtZgQ@mail.gmail.com
Add a .git-blame-ignore-revs file with a list of pgindent, pgperlyidy,
and reformat-dat-files commit hashes.  Postgres hackers that configure
git to use the ignore file will get git-blame output that avoids
attributing line changes to the ignored indent commits.  This makes
git-blame output much easier to work with in practice.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAH2-Wz=cVh3GHTP6SdLU-Gnmt2zRdF8vZkcrFdSzXQ=WhbWm9Q@mail.gmail.com
Commits 84f5c29 et al missed the need to cover plpgsql's "simple
expression" code path.  If the first thing we execute after a
COMMIT/ROLLBACK is one of those, rather than a full-fledged SPI command,
we must explicitly do EnsurePortalSnapshotExists() to make sure we have
an outer snapshot.  Note that it wouldn't be good enough to just push a
snapshot for the duration of the expression execution: what comes back
might be toasted, so we'd better have a snapshot protecting it.

The test case demonstrating this fact cheats a bit by marking a SQL
function immutable even though it fetches from a table.  That's
nothing that users haven't been seen to do, though.

Per report from Jim Nasby.  Back-patch to v11, like the previous fix.

Discussion: https://postgr.es/m/378885e4-f85f-fc28-6c91-c4d1c080bf26@amazon.com
We've long contended with isolation test results that aren't entirely
stable.  Some test scripts insert long delays to try to force stable
results, which is not terribly desirable; but other erratic failure
modes remain, causing unrepeatable buildfarm failures.  I've spent a
fair amount of time trying to solve this by improving the server-side
support code, without much success: that way is fundamentally unable
to cope with diffs that stem from chance ordering of arrival of
messages from different server processes.

We can improve matters on the client side, however, by annotating
the test scripts themselves to show the desired reporting order
of events that might occur in different orders.  This patch adds
three types of annotations to deal with (a) test steps that might or
might not complete their waits before the isolationtester can see them
waiting; (b) test steps in different sessions that can legitimately
complete in either order; and (c) NOTIFY messages that might arrive
before or after the completion of a step in another session.  We might
need more annotation types later, but this seems to be enough to deal
with the instabilities we've seen in the buildfarm.  It also lets us
get rid of all the long delays that were previously used, cutting more
than a minute off the runtime of the isolation tests.

Back-patch to all supported branches, because the buildfarm
instabilities affect all the branches, and because it seems desirable
to keep isolationtester's capabilities the same across all branches
to simplify possible future back-patching of tests.

Discussion: https://postgr.es/m/327948.1623725828@sss.pgh.pa.us
The code to signal a running walsender when its reserved WAL size grows
too large is completely uncovered before this commit; this adds coverage
for that case.

This test involves sending SIGSTOP to walsender and walreceiver, then
advancing enough WAL for a checkpoint to trigger, then sending SIGCONT.

There's no precedent for STOP signalling in Perl tests, and my reading
of relevant manpages says it's likely to fail on Windows.  Because of
this, this test is always skipped on that platform.

This version fixes a couple of rarely hit race conditions in the
previous attempt 0912698; most notably, both LOG string searches
are loops, not just the second one; we acquire the start-of-log position
before STOP-signalling; and reference the correct process name in the
test description.  All per Tom Lane.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/202106102202.mjw4huiix7lo@alvherre.pgsql
Previously, isolationtester displayed SQL query results using some
ad-hoc code that clearly hadn't had much effort expended on it.
Field values longer than 14 characters weren't separated from
the next field, and usually caused misalignment of the columns
too.  Also there was no visual separation of a query's result
from subsequent isolationtester output.  This made test result
files confusing and hard to read.

To improve matters, let's use libpq's PQprint() function.  Although
that's long since unused by psql, it's still plenty good enough
for the purpose here.

Like 741d7f1, back-patch to all supported branches, so that this
isn't a stumbling block for back-patching isolation test changes.

Discussion: https://postgr.es/m/582362.1623798221@sss.pgh.pa.us
Our uses of gss_display_status() and gss_display_name() assumed
that the gss_buffer_desc strings returned by those functions are
null-terminated.  It appears that they generally are, given the
lack of field complaints up to now.  However, the available
documentation does not promise this, and some man pages
for gss_display_status() show examples that rely on the
gss_buffer_desc.length field instead of expecting null
termination.  Also, we now have a report that on some
implementations, clang's address sanitizer is of the opinion
that the byte after the specified length is undefined.

Hence, change the code to rely on the length field instead.

This might well be cosmetic rather than fixing any real bug, but
it's hard to be sure, so back-patch to all supported branches.
While here, also back-patch the v12 changes that made pg_GSS_error
deal honestly with multiple messages available from
gss_display_status.

Per report from Sudheer H R.

Discussion: https://postgr.es/m/5372B6D4-8276-42C0-B8FB-BD0918826FC3@tekenlight.com
The syntax summaries for CREATE FUNCTION and allied commands
made it look like LEAKPROOF is an alternative to
IMMUTABLE/STABLE/VOLATILE, when of course it is an orthogonal
option.  Improve that.

Per gripe from aazamrafeeque0.  Thanks to David Johnston for
suggestions.

Discussion: https://postgr.es/m/162444349581.694.5818572718530259025@wrigleys.postgresql.org
For no obvious reason, isolationtester has always insisted that
session and step names be written with double quotes.  This is
fairly tedious and does little for test readability, especially
since the names that people actually choose almost always look
like normal identifiers.  Hence, let's tweak the lexer to allow
SQL-like identifiers not only double-quoted strings.

(They're SQL-like, not exactly SQL, because I didn't add any
case-folding logic.  Also there's no provision for U&"..." names,
not that anyone's likely to care.)

There is one incompatibility introduced by this change: if you write
"foo""bar" with no space, that used to be taken as two identifiers,
but now it's just one identifier with an embedded quote mark.

I converted all the src/test/isolation/ specfiles to remove
unnecessary double quotes, but stopped there because my
eyes were glazing over already.

Like 741d7f1, back-patch to all supported branches, so that this
isn't a stumbling block for back-patching isolation test changes.

Discussion: https://postgr.es/m/759113.1623861959@sss.pgh.pa.us
Reported-by: Simon Riggs
Author: Takamichi Osumi
Reviewed-by: Amit Kapila
Backpatch-through: 9.6
Discussion: https://www.postgresql.org/message-id/20210222222847.tpnb6eg3yiykzpky@alap3.anarazel.de
Commits 9de77b5 and ac4645c missed to update the logical replication
message formats section in the docs.

Author: Brar Piening
Reviewed-by: Amit Kapila
Discussion: https://www.postgresql.org/message-id/cc70956c-e578-e54f-49e6-b5d68c89576f@gmx.de
dimoffon and others added 30 commits June 2, 2026 15:40
The checkpointer crashed (NULL function-pointer call) in RememberSyncRequest
and SyncPostCheckpoint whenever an append-optimized relation's segment files
were forgotten/unlinked. GPDB added SYNC_HANDLER_AO to the SyncRequestHandler
enum and aomd.c registers SYNC_FORGET_REQUEST/SYNC_UNLINK_REQUEST with it, but
the syncsw[] dispatch table in sync.c had no entry for SYNC_HANDLER_AO, so
syncsw[SYNC_HANDLER_AO].sync_* were all NULL.

Because every dropped AO table schedules such requests, the checkpointer
crash-looped: the postmaster restarted it, it crashed on the next checkpoint,
and each crash dumped a multi-hundred-MB core -- enough to fill the disk and
abort the regression run.

Register the SYNC_HANDLER_AO entry. aosyncfiletag() already exists in md.c;
the deferred unlink only targets the base segfile (segno 0) and the filter
matches by database, so mdunlinkfiletag()/mdfiletagmatches() are correct for
AO as well. Verified: creating, populating and dropping AO and AOCS tables
followed by CHECKPOINT no longer crashes the checkpointer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…odifyTable

INSERT (and COPY) into an append-optimized table reported success but stored
zero rows, and each segment logged "Snapshot reference leak ... still
referenced".

ExecInitModifyTable() calls table_dml_init() for every result relation, which
for AO/AOCS sets up a cached insert descriptor (registering a metadata
snapshot).  The matching table_dml_finish() -- which flushes the segment file,
writes the pg_aoseg/pg_aocsseg row counts and unregisters the snapshot -- was
never called: the PG14 merge adopted upstream's ExecEndModifyTable() loop,
which has no such hook, and the GPDB call was dropped.  With the descriptor
never finished, pg_aoseg.tupcount stayed 0 (rows invisible) and the snapshot
leaked.

Call table_dml_finish() for each result relation in ExecEndModifyTable(),
mirroring the table_dml_init() in ExecInitModifyTable(). All AMs (heap, AO,
AOCS) provide dml_finish, so the call is safe for every relation.

Verified: AO and AOCS INSERT now persist all rows (pg_aoseg.tupcount matches),
with no snapshot-leak warnings; DELETE on AO continues to work.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Querying pg_locks crashed the segments (SIGSEGV in heap_form_tuple ->
heap_compute_data_size -> strlen) whenever pg_lock_status() ran there -- which
the coordinator does for every pg_locks query (CdbDispatchCommand "SELECT *
FROM pg_lock_status()"). The DTX regression tests query pg_locks, so the crash
cascaded into a full-cluster reset + DTX recovery storm that aborted the run.

Two bugs:

1. lockfuncs.c built a malformed tuple descriptor: the GPDB columns were all
   added at attnum 19 (mppSessionId/mppIsWriter/gp_segment_id), leaving attnums
   17 and 18 uninitialized (atttypid 0, attlen 0).  att_addlength_pointer()
   treats attlen 0 as a cstring and runs strlen() on the (byval) Datum -> crash.
   The value array had the same typo: values[17] assigned twice (the second
   overwriting mppIsWriter with segindex) and values[18] never set.  Fixed the
   TupleDescInitEntry attnums to 17/18/19 and the value indices to 16/17/18; the
   segment-result and predicate-lock loops had the same off-by-one (waitstart
   not accounted for) and are corrected too.

2. A "kept both sides" merge left pg_proc.dat with two proallargtypes/
   proargmodes/proargnames sets for pg_lock_status -- the GPDB one (18 cols, the
   3 MPP columns, no waitstart) and upstream's (16 cols, with waitstart).  The
   last wins in the Perl hash, so the catalog declared only 16 columns while the
   C code returns 19, giving "Returned row contains 19 attributes, but query
   expects 16".  Merged them into the correct 19-column union (15 base +
   waitstart + mppSessionId + mppIsWriter + gp_segment_id).

Verified: pg_locks now has 19 columns and returns segment lock rows (with
gp_segment_id) with no crash.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The PG14 merge adopted upstream's new row-identity machinery
(add_row_identity_columns / add_row_identity_var / ROWID_VAR), which
emits only the "ctid" junk column for the UPDATE/DELETE result relation.
GPDB additionally needs a "gp_segment_id" junk column so the Explicit
Motion can route each modified/deleted row back to the segment that owns
it (cdbpathtoplan_create_motion_plan -> find_junk_tle("gp_segment_id")),
and so SplitUpdate can read it from its input tlist
(nodeSplitUpdate.c). The merge dropped GPDB's gp_segment_id when taking
the upstream shape, so every distributed UPDATE/DELETE that needed an
Explicit Motion failed with "could not find gp_segment_id in subplan's
targetlist" (444 occurrences in the regression baseline).

Re-graft gp_segment_id into add_row_identity_columns alongside ctid. It
rides the same ROWID_VAR sharing path, so the single-table, randomly-
distributed, replicated and inherited/partitioned cases all get the
column with correct per-leaf translation.

Validated: single-table hash/random/replicated UPDATE and DELETE now
succeed. (Partitioned UPDATE and distribution-key UPDATE expose two
further, independent PG14-merge bugs -- is-split-update list length and
ExecBuildUpdateProjection targetColnos -- tracked separately.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pdate information"

PG14 removed inheritance_planner and now plans inherited/partitioned
UPDATE/DELETE inside grouping_planner, building per-leaf
resultRelations / updateColnosLists / withCheckOptionLists /
returningLists in a loop. GPDB's is_split_updates list, however, was
passed to create_modifytable_path() as a hard-coded
list_make1_int(root->is_split_update) -- a single element regardless of
how many leaf result relations the loop produced.

For a partitioned/inherited UPDATE with N leaf relations this yields
resultRelations of length N but isSplitUpdates of length 1, so
ExecInitModifyTable() failed with "ModifyTable node is missing
is-split-update information" (list_length(isSplitUpdates) != nrels);
80 occurrences in the regression baseline. (The plan-time length Assert
in make_modifytable() is compiled out in this non-cassert build, so the
mismatch only surfaced at execution.)

Build is_split_updates in lockstep with resultRelations across all three
branches (multi-relation leaf loop, the all-excluded dummy fallback, and
the single-relation case). All leaf partitions share the parent's
distribution policy, so the split-update decision is uniform and each
entry is root->is_split_update.

Validated: partitioned non-distribution-key UPDATE and DELETE now
succeed. (Distribution-key UPDATE still hits the separate
ExecBuildUpdateProjection targetColnos bug, tracked next.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A distribution-key UPDATE is executed in GPDB as a Split Update: the
modified tuple may belong on a different segment, so each input row is
turned into a DELETE of the old tuple plus an INSERT of the new one,
re-routed by an Explicit Motion.  The PG14 merge left this path broken at
several layers; this fixes all of them so single-table, partitioned,
random, replicated and join-driven distribution-key UPDATEs produce
correct results (and correct command tags).

preptlist.c:
 * A Split Update needs the full new tuple, so the UPDATE targetlist is
   expanded to every column.  update_colnos was extracted from the
   *partial* (SET-only) tlist before that expansion, so it had fewer
   entries than the expanded tlist -- ExecBuildUpdateProjection() then
   rejected the plan with "targetColnos does not match subplan target
   list".  Expand first, then take update_colnos from the expanded tlist.
 * expand_targetlist() filled columns missing from the UPDATE tlist with
   a NULL Const (correct only for INSERT).  For a Split Update the
   unmodified columns must keep their old value, so emit a Var referring
   to the target relation's attribute instead; otherwise the re-inserted
   tuple had its unmodified columns blanked to NULL (e.g. a partition key
   became NULL -> "no partition of relation found for row").

nodeModifyTable.c:
 * ExecInitModifyTable() never set ri_action_attno / ri_segid_attno
   (only execMain.c initialised them to InvalidAttrNumber), so the
   "DMLAction" and "gp_segment_id" junk columns were never located.
   Look them up from the subplan targetlist for UPDATE/DELETE.
 * ExecModifyTable()'s CMD_UPDATE case always called ExecUpdate(), even
   for Split Update rows -- the DMLActionExpr value was read but ignored,
   and the dead helper ExecSplitUpdate_Insert() was never called, so
   delete/insert action rows were wrongly applied as in-place updates and
   corrupted data.  Dispatch on the action: DML_INSERT goes through
   ExecSplitUpdate_Insert(), DML_DELETE through ExecDelete(splitUpdate).
   The DELETE side runs with canSetTag=false so each updated row is
   counted once.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
get_agg_clause_costs() (moved to prepagg.c in the PG14 merge) stopped
populating AggClauseCosts.hasNonCombine / hasNonSerial.  Those flags
gate GPDB's multi-stage (Motion-separated) aggregation: planner.c only
sets try_mpp_multistage_aggregation when both are false, and
cdb_create_multistage_grouping_paths() Assert()s on them.  Because they
were left at zero, the planner happily built a two-stage plan for an
aggregate that has no combine function (or, for an INTERNAL transition
state, no serialize/deserialize function) -- e.g. string_agg, json_agg,
jsonb_agg.  Costing the combine stage then called add_function_cost()
on the invalid (0) combine OID and failed during planning with
"cache lookup failed for function 0" (171 occurrences in the regression
baseline; the Assert is compiled out in this non-cassert build).

Re-populate hasNonCombine / hasNonSerial in get_agg_clause_costs(),
mirroring the root->hasNonPartialAggs / root->hasNonSerialAggs logic, so
such aggregates fall back to single-stage aggregation.

Validated: string_agg (plain and ORDER BY), json_agg now work; combinable
aggregates (count(DISTINCT), sum) are unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… index_cleanup/truncate)

vacuum_params_to_options_list() converts a VacuumParams bitmask back into
a DefElem list to dispatch to the segments.  Two PG14 changes were not
accounted for, so it rejected ordinary VACUUMs:

 * PG14 added VACOPT_PROCESS_TOAST (0x40), set by default for VACUUM.
   The bit-walking loop didn't clear it, so the trailing "if (optmask
   != 0)" guard fired with "unrecognized vacuum option 40" (149
   occurrences in the regression baseline).  Always emit a
   "process_toast" DefElem with its value -- the segment defaults it to
   on, so it must be dispatched explicitly to honour "(PROCESS_TOAST
   off)".

 * PG14 made index_cleanup and truncate tri-state (UNSPECIFIED / AUTO /
   DISABLED / ENABLED) instead of boolean.  The code only handled
   DISABLED/ENABLED and elog(ERROR)'d otherwise, so a plain VACUUM (which
   leaves index_cleanup = AUTO) failed with "unexpected VACUUM
   'index_cleanup' option '1'".  Only dispatch an explicit
   DISABLED/ENABLED; for UNSPECIFIED/AUTO emit nothing and let each
   segment apply its own (matching) default.

Validated: VACUUM, VACUUM FULL/FREEZE/ANALYZE, VACUUM (PROCESS_TOAST
off), (INDEX_CLEANUP auto|off), (TRUNCATE off), (VERBOSE, ANALYZE) all
succeed and process TOAST tables on the segments.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PG14 changed UPDATE to carry only the SET columns in the plan tuple and
to reconstruct the full new tuple in the executor by fetching the old
tuple (table_tuple_fetch_row_version) and merging the unmodified columns
from it.  Append-optimized tables can't fetch a tuple by TID --
appendonly_fetch_row_version / aoco_fetch_row_version are stubs that
raise "feature not supported on appendoptimized relations" -- so every
UPDATE of an AO/AOCS table failed (86 occurrences in the regression
baseline).

AO/AOCS UPDATE is implemented as delete + re-insert by the table AM
(appendonly_tuple_update), which needs the full new tuple and the old
TID, but not the old tuple's contents.  So:

 * preprocess_targetlist(): expand the UPDATE targetlist to every column
   for AO/AOCS relations (as is already done for a Split Update), and
   take update_colnos from the expanded tlist.  The plan then produces
   the complete new tuple, so no old-tuple merge is required.

 * ExecModifyTable(): for an AO/AOCS result relation, skip the
   table_tuple_fetch_row_version() old-tuple fetch and leave the old slot
   empty.  With a full targetlist ExecBuildUpdateProjection() doesn't read
   the old slot, and the AM's tuple_update gets the full new tuple plus
   the row's TID.

Validated: AO and AOCS UPDATE (distribution-key and not) and DELETE all
succeed with correct values and unmodified columns preserved.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ist"

A global aggregate with an in-aggregate ORDER BY (e.g.
array_agg(x ORDER BY x), string_agg(...ORDER BY...)) failed during
planning with "ORDER/GROUP BY expression not found in targetlist" (33
occurrences in the regression baseline).  Two PG14-merge regressions
combined to cause it:

 * create_motion_plan() was the only create_*_plan that didn't accept
   the createplan flags, so CP_LABEL_TLIST was dropped at every Motion.
   A Sort/Agg that needs ressortgroupref labels on its input then could
   not find its ORDER BY column in a Gather Motion's targetlist (make_motion
   shares the child's tlist, so labeling the child suffices).  Thread
   flags through create_motion_plan() and pass CP_LABEL_TLIST down to the
   subplan recursion.  This fixes single-stage ordered aggregates such as
   string_agg (which can't be partially aggregated).

 * AggClauseCosts.numPureOrderedAggs (and numOrderedAggs) were never
   populated by get_agg_clause_costs().  numPureOrderedAggs counts
   aggregates with an explicit ORDER BY / WITHIN GROUP, whose result
   depends on the global input order and therefore cannot be split across
   a Motion.  Left at 0, the MPP grouping planner's has_ordered_aggs gate
   never tripped, so a combine-capable ordered aggregate like array_agg
   was routed into a multi-stage plan and failed.  Populate both counts in
   get_agg_clause_costs(); DISTINCT-only aggregates remain multi-stageable
   via the DQA path.

Validated: array_agg/string_agg/json_agg with ORDER BY (global and
grouped) return correct results; array_agg(DISTINCT) still multi-stages;
plain aggregates and ordered-set aggregates (percentile_cont) unaffected.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
GPDB allows a window frame offset to reference columns of the input
(e.g. "RANGE floor(qty) PRECEDING"), unlike upstream PostgreSQL which
requires a constant offset.  The PG14 merge regressed set_plan_refs():
the T_WindowAgg case fixes the frame edges with fix_scan_expr(), even
though the in-tree comment still says GPDB must use fix_upper_expr() (and
an indexed_tlist variable was declared for that purpose but left unused).

fix_scan_expr() leaves a column-valued offset's Vars with base-relation
varnos instead of OUTER_VAR.  At execution compute_start_end_offsets()
sets ecxt_outertuple to the scan slot and evaluates the offset, but the
mis-numbered Var reads the wrong/empty slot, so the WindowAgg segfaults
(compute_start_end_offsets -> ExecInterpExprStillValid -> SIGSEGV).  In
the qp_olap_window regression test this crashed all segments on ~50
queries; the crash/recovery/retry loop is what looked like a "hang" in
the baseline run and filled the disk with hundreds of core files.

Restore the documented behaviour: build the subplan's indexed tlist and
fix the start/end offsets with fix_upper_expr(..., OUTER_VAR, ...), so a
column-valued offset's Vars correctly reference the WindowAgg input.

Validated: the previously-crashing queries now return rows, and the full
qp_olap_window test runs to completion with 0 segment crashes (was 592
core dumps) and output matching the expected answer file; the only
remaining errors are the test's intentional negative cases (multi-column
RANGE, negative offsets, division by zero).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
get_agg_clause_costs() (moved to prepagg.c in the PG14 merge) stopped
populating AggClauseCosts.distinctAggrefs, the list of Aggrefs carrying
aggdistinct. The MPP grouping planner gates its specialized DQA paths on
that list (cdb_create_multistage_grouping_paths -> recognize_dqa_type,
plus several can_hash checks in cdbgroupingpaths.c). With the list always
empty, a query like "count(distinct d) group by i" never built the DQA
plan that dedups the distinct argument by adding it to the first-stage
group key. Instead it fell through to the generic two-stage partial
aggregation, which cannot partialize a DISTINCT aggregate: the Partial
HashAggregate tried to finalize the per-group distinct sort and crashed
all segments in finalize_aggregates() -> tuplesort_performsort().

Re-populate distinctAggrefs in the agginfos loop alongside the existing
numOrderedAggs counting. The first-stage group key is now (i, d) as
expected (matches expected/gp_dqa.out), and the single-DQA queries run
without crashing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
VARSIZE_TO_SHORT() hardcoded the big-endian short-varlena header form
(length | 0x80), but everything else in the tree uses the standard
PostgreSQL varlena macros, which on a little-endian build encode a 1-byte
header as (length << 1) | 0x01 (see SET_VARSIZE_1B / VARATT_IS_SHORT /
VARSIZE_1B). So converting a 4-byte varlena to short produced a header
like 0x88 for an 8-byte datum, which VARATT_IS_SHORT() does not recognize
(0x88 is even) and VARSIZE_ANY() misreads as a 4-byte header with a huge
length.

This corrupted any value written through the 4-byte -> short conversion
path:
  - AOCS column store (datumstreamblock.c DatumStreamBlockWrite_Put*),
  - memtuples (memtuple.c).
A normal INSERT often dodged it because the value arrived already in
standard short form (0x11, copied verbatim), but an UPDATE's projected
value arrived as a 4-byte varlena (0x2c) and hit the conversion path,
storing 0x88. The next scan of that AOCS segfile then fed the bogus
header to heap_form_minimal_tuple()/the Motion serializer, which computed
a multi-KB/-MB length and SIGSEGV'd every segment (aocs_getnext ->
... -> memmove). Reproduces via vacuum_gp's aocs_empty, alter_table_aocs's
addcol*, and decode_expr's appendonly decodecharao2.

Make VARSIZE_TO_SHORT endianness-aware, matching SET_VARSIZE_1B.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PG14 changed reindex_index()'s final argument from an int options bitmask
to a ReindexParams pointer, which it dereferences immediately
(params->options at index.c:3606). bmbulkdelete() -- the bitmap AM's
ambulkdelete, which rebuilds the whole index because bitmap indexes have
no incremental delete -- still passed a literal 0, so reindex_index()
dereferenced a NULL ReindexParams and SIGSEGV'd whenever VACUUM processed
a bitmap index (e.g. "vacuum bm_test", bitmap_index regression test).

Pass an empty ReindexParams struct instead. All other reindex_index()
callers already pass a proper pointer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…d not open file t_<relfilenode>")

GPDB stores temp tables in *shared* buffers, not local ones: ReadBuffer_common
hardcodes isLocalBuf=false (SmgrIsTemp is commented out, bufmgr.c). So when a
temp relation's storage is unlinked, its dirty pages must be dropped from the
shared buffer pool like any other relation.

DropRelFileNodeBuffers honors this -- it #if 0's out the upstream
"RelFileNodeBackendIsTemp -> drop from local buffers" short-circuit, with the
comment "Temp tables use shared buffers in Greenplum". The pre-merge GPDB
DropRelFileNodesAllBuffers had the same #if 0. The PG14 buffer-manager rework
(SMgrRelation-based Drop*Buffers + smgrnblocks caching) kept the comment but
dropped the #if 0, re-activating the upstream short-circuit. Temp relations
with backend != InvalidBackendId (TempRelBackendId, the "temp but not ours"
case in RelationInitPhysicalAddr) are now routed to DropRelFileNodeAllLocalBuffers
(dead code -- GPDB never uses local buffers) and, critically, excluded from the
shared-buffer drop list. Their dirty shared buffers are leaked.

After the temp file is unlinked (deferred to the next checkpoint via mdunlink),
the leaked dirty buffers reference a file that no longer exists. The next
bgwriter/checkpointer sweep or buffer eviction tries to flush them and PANICs:

  ERROR: could not open file "base/<db>/t_<relfilenode>": No such file or
  directory ... writing block N of relation base/<db>/t_<relfilenode>  (md.c)

In a busy run this repeats thousands of times per minute and takes segments
down, cascading into widespread failures.

Fix: restore the #if 0 around the temp short-circuit in
DropRelFileNodesAllBuffers, matching DropRelFileNodeBuffers and the pre-merge
code, so temp relations are dropped from shared buffers like everything else.

Verified with a controlled before/after on the same workload (the standard
'temp' regression test + CHECKPOINT + heavy eviction): a leak-detection probe
confirmed temp rels reach the function with backend=-2 (TempRelBackendId) and
dirty shared buffers; the buggy build produced 156k+ "could not open file
t_<relfilenode>" errors plus a crash, while the fixed build produced zero
errors and no cores.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…elete_tuples)

PG14 added table-AM-assisted index tuple deletion to nbtree: both the simple
LP_DEAD pass (_bt_simpledel_pass) and the bottom-up deletion pass
(_bt_bottomupdel_pass) call _bt_delitems_delete_check() ->
table_index_delete_tuples(), which dispatches to
rel->rd_tableam->index_delete_tuples() with no NULL check (tableam.h). The heap
AM provides heap_index_delete_tuples, but GPDB's append-optimized AMs leave the
callback NULL (appendonlyam_handler.c, aocsam_handler.c) -- AO/AOCS reclaim dead
tuples via the visimap + VACUUM, not heap-style TID deletion. So a btree index
on an AO/AOCS table that triggers either deletion pass jumps through the NULL
pointer and SIGSEGVs:

  arenadata#6  0x0  (NULL)
  arenadata#7  _bt_delitems_delete_check
  arenadata#8  _bt_bottomupdel_pass
  arenadata#9  _bt_delete_or_dedup_one_page
  arenadata#10 _bt_doinsert

Both passes are optional space-reclamation optimizations performed to avoid a
leaf-page split; deduplication (the third strategy) does not use the table AM.
Fix: in _bt_delete_or_dedup_one_page() skip both deletion passes when the table
AM does not implement index_delete_tuples, falling back to deduplication / a
page split. This restores the pre-PG14 behavior for AO/AOCS indexes (no
incremental index deletion) and leaves heap behavior unchanged.

Verified before/after with AO and AOCS tables whose btree index keys all share
one value (so entries pile on one leaf-page chain) churned by repeated
full-table UPDATEs of a non-indexed column: pre-fix produced 18 SIGSEGV cores
with the backtrace above and took segments down; post-fix both tables complete
with correct row counts, zero cores, all segments up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tartup_dummy)

EXPLAIN ANALYZE of a CREATE TABLE AS / SELECT INTO that uses EXECUTE of a
prepared statement crashed the coordinator:

  arenadata#6  intorel_startup_dummy        (rel == NULL)
  arenadata#7  standard_ExecutorRun
  arenadata#8  ExplainOnePlan
  arenadata#9  ExplainExecuteQuery
  arenadata#10 ExplainOneUtility

GPDB creates the CTAS target relation in intorel_initplan() (called from
InitPlan, execMain.c) gated on PlannedStmt->intoClause, and leaves the
DestReceiver's rStartup a near-dummy that dereferences the relation created
there.  The freshly-planned EXPLAIN path (ExplainOneQuery) copies the IntoClause
onto the plan before running it, but ExplainExecuteQuery passed the cached plan
to ExplainOnePlan without setting PlannedStmt->intoClause.  intorel_initplan was
therefore skipped, the DR_intorel's rel stayed NULL, and intorel_startup_dummy
dereferenced it during ExecutorRun -> coordinator SIGSEGV (which drops every
session in the run).

Fix: in ExplainExecuteQuery set the IntoClause on the plan when into != NULL,
mirroring ExplainOneQuery.  The PlannedStmt belongs to the shared cached plan,
so set it on a copy -- otherwise a stale IntoClause would make a later plain
EXECUTE of the same statement try to create a table.

Verified: pre-fix, EXPLAIN ANALYZE CREATE TABLE t AS EXECUTE p dumps a core with
the backtrace above; post-fix it no longer crashes.  (CREATE TABLE AS ... EXECUTE
then hits a separate, pre-existing OID-dispatch error -- "oids were assigned, but
not dispatched to QEs" -- that also affects plain CREATE TABLE AS EXECUTE and is
out of scope for this crash fix.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…("oids were assigned, but not dispatched to QEs")

All forms of CREATE TABLE AS / SELECT INTO ... EXECUTE (plain, temp, and under
EXPLAIN ANALYZE) failed on the coordinator with:

  WARNING:  OID assignment not dispatched: catalog 1259 ... name "x"
  ERROR:   oids were assigned, but not dispatched to QEs

GPDB plans a CTAS specially: the IntoClause must reach the planner so the query
is marked PARENTSTMTTYPE_CTAS and a *distributed* plan is built -- rows are
redistributed to the target table's segments, which create and populate the
relation (consuming the OIDs the QD pre-assigned).  This is threaded through the
plancache via an extra IntoClause argument to GetCachedPlan() ->
RevalidateCachedQuery()/choose_custom_plan()/BuildCachedPlan() (MPP-8135).

The PG14 plancache merge adopted upstream's GetCachedPlan() signature and turned
the GPDB intoClause argument into a hardcoded local `IntoClause *intoClause =
NULL;`, even though the function's own header comment still documents the extra
parameter and the rest of plancache.c still threads it.  With it forced NULL the
EXECUTE plan was the plain gather-to-coordinator SELECT: the QD created the
table and assigned OIDs, the segments never did, and the un-dispatched OIDs
tripped the end-of-xact check in AtEOXact_DispatchOids().

Fix: restore the IntoClause parameter to GetCachedPlan() (header + definition,
dropping the dead NULL local) and pass it from the two CTAS callers,
ExecuteQuery() and ExplainExecuteQuery(); the remaining non-CTAS callers (spi.c,
postgres.c) pass NULL as before.

Verified: plain, parameterized, WITH NO DATA, and EXPLAIN ANALYZE CREATE TABLE
AS EXECUTE all succeed with the rows correctly distributed across segments
(nsegs=3); a plain EXECUTE returning rows is unaffected and the cached plan is
not corrupted.  Together with the prior intorel_startup_dummy fix, CTAS-EXECUTE
works end to end.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
REINDEX CONCURRENTLY corrupts the catalog on a distributed cluster.  Upstream's
concurrent reindex builds a fresh index (a new "_ccnew" pg_class entry, new OID),
swaps it in under the original name, and drops the old one -- i.e. it
intentionally *changes the index OID*.  In GPDB only the coordinator runs the
concurrent path; the segments reindex non-concurrently and keep the original OID.
The result is an index whose OID differs between the QD and the segments
("could not open relation with OID N (segX)" on the next index scan), plus the
transient _ccnew OIDs are never dispatched ("oids were assigned, but not
dispatched to QEs").

Properly supporting concurrent reindex in MPP requires the segments to rebuild
under the QD's new OID, which collides with PreventInTransactionBlock() and the
fact that a QE cannot run a multi-transaction concurrent reindex from within a
dispatched statement -- a larger feature effort (tracked separately).

As a safe interim, fall back to a normal (non-concurrent) reindex on the
coordinator, with a NOTICE, in ReindexIndex()/ReindexTable() (covering REINDEX
TABLE/INDEX, including partitioned, which then reindexes each child
non-concurrently via ReindexPartitions) and ReindexMultipleInternal() (covering
REINDEX DATABASE/SCHEMA).  Single-node utility mode is unaffected and keeps the
working concurrent path.

Verified: REINDEX (CONCURRENTLY) TABLE/INDEX and a partitioned table all succeed
with a NOTICE; the index OID is identical on the coordinator and every segment
(no divergence) and a forced index scan returns correct results; no cores, no
OID-dispatch error.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ordered child columns

A distribution-key UPDATE on a partitioned table whose leaf partitions have a
different physical column order than the parent crashed in the executor:

  pg_detoast_datum <- hashtext/hash_numeric <- cdbhash <- ExecSplitUpdate

create_splitupdate_plan() derived the SplitUpdate's hash attnos and column types
from path->resultRelation's relcache entry.  For a partitioned UPDATE that is a
leaf partition, whose physical column order can differ from the parent's
(e.g. update.sql's part_c_1_100 "e,d,c,b,a").  But the SplitUpdate's input tuples
are the subplan output, which is labeled with root->processed_tlist -- i.e. the
*nominal* (parent) column layout.  So the distribution-key attnos taken from the
leaf policy indexed the wrong column of the parent-layout tuple: a key like "a"
at leaf attno 5 selected the parent's attno-5 column ("d", an int), which cdbhash
then fed to hashtext, dereferencing the small int as a varlena -> SIGSEGV.  (The
mismatch also tripped the type Assert in the insertColIdx loop, which is compiled
out in non-cassert builds.)

Fix: take resultDesc and cdbpolicy from the nominal target relation
(root->parse->resultRelation), which is the layout the subplan tuples are in, so
insertColIdx, hashAttnos and hashFuncs all line up with the SplitUpdate input.
For a non-partitioned UPDATE this is the same relation as before.

Verified with update.sql's reordered-column partitions and assorted
text/numeric/multi-column distribution-key updates (including cross-partition row
movement): the hash now reads the correct key column and there is no crash; the
distribution key routes correctly and results are unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ld columns

Follow-on to the SplitUpdate cdbhash fix.  A distribution-key UPDATE of a
partitioned table whose leaf partitions have a different physical column order
than the parent failed during the reinsert with:

  ERROR: table row type and query-specified row type do not match
         (ExecCheckPlanOutput, nodeModifyTable.c)

The SplitUpdate emits the new tuple in the root (nominal) target relation's
column layout (the subplan is labeled with root->processed_tlist), but the
DML_INSERT replay built the insert projection against the *source leaf*
partition's ResultRelInfo, whose descriptor can differ -- so ExecCheckPlanOutput
rejected the layout.  (Before the cdbhash fix this path crashed earlier and was
never reached.)

Fix: build the split-update insert projection against the root result relation
(mtstate->rootResultRelInfo), so it matches the subplan output, and set up
partition tuple routing for split updates -- so ExecInsert() routes the new
tuple to the correct leaf, converting the layout via ri_RootToPartitionMap, and
enforces the partition constraint.  For a non-partitioned target
rootResultRelInfo is the result relation itself, so this is a no-op there.

Verified: every split-update shape (heap/AO/AOCS, toasted, dropped-column,
multi-column and text/numeric distribution keys, UPDATE ... FROM, and
partitioned) inserts/redistributes correctly with no crash; the update
regression test no longer errors with a row-type mismatch (a partition-key
update that must stay within its subtree now correctly raises a partition
constraint violation -- naming the UPDATE target relation).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ResetTempNamespace() (reached from AbortTransaction -> ResetAllGangs during
primary gang-loss recovery) unconditionally cancels the temp-namespace
before_shmem_exit callback:

    cancel_before_shmem_exit(RemoveTempRelationsCallback, 0);

Before PG14, cancel_before_shmem_exit() silently did nothing when the callback
was not the latest entry.  Upstream commit c9ae5cb made it raise an error in
that case, and the PG14 merge adopted the strict version.  But ResetTempNamespace
can legitimately reach this with the callback absent (temp namespace created but
not yet committed, so AtEOXact_Namespace never registered it) or no longer the
latest entry.  Thrown from inside AbortTransaction(), that error escalates to a
coordinator PANIC -- turning any recoverable gang loss into an all-sessions-down
crash, which then cascades into hundreds of "protocol synchronization was lost"
and "transaction is aborted" failures across the suite.

Add a non-throwing cancel_before_shmem_exit_if_latest() (the pre-PG14 lenient
behavior: remove only if it is the latest entry, else return false) and use it
in ResetTempNamespace().  Leaving the callback registered is harmless --
RemoveTempRelationsCallback() no-ops once myTempNamespace is reset just below.
The strict cancel_before_shmem_exit() is kept for its correct-LIFO callers
(PG_ENSURE_ERROR_CLEANUP).

Verified: a full installcheck-good run that previously dumped a
cancel_before_shmem_exit coordinator-PANIC core now completes the same span with
zero cores.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…limit" (PG14)

A trivial COPY ... FROM stdin (or any COPY FROM) failed on every segment with
"invalid message length" -> "terminating connection because protocol
synchronization was lost", which the QD reported as "MPP detected N segment
failures, system is reconnected".  Because COPY loads the data in most
regression tests, this poisoned a large swath of the suite (231
protocol-synchronization-lost FATALs in a full run).  The same bug independently
broke nextval/sequences over MPP ("nextval: unable to parse nextval response
from QD", ~291 hits).

Root cause: GPDB multiplexes non-query messages over a libpq connection and
reads them with pq_getmessage(buf, 0), using a maxlen of 0 to mean "no upper
limit" -- COPY data forwarded QD->QE (copy.c) and the nextval-over-NOTIFY
response (sequence.c).  GPDB's pq_getmessage encoded that as

    if (len < 4 || (maxlen > 0 && len > maxlen))

The PG14 merge adopted upstream's length check verbatim

    if (len < 4 || len > maxlen)

dropping the "maxlen > 0 &&" guard.  With maxlen == 0 every message with len > 0
is rejected as "invalid message length", so the QE tears down the connection on
the first CopyData byte.

Diagnosis: wire-probes confirmed the QD frames CopyData correctly (type 'd',
len = 4 + nbytes, outBuffer_shared = 0) and the QE reads the correct len but
with maxlen = 0 -- so the comparison, not the data, was wrong.

Fix: restore the GPDB "maxlen > 0 &&" guard and document the maxlen == 0
convention so a future merge does not drop it again.

Verified on a live cluster: COPY FROM stdin, COPY FROM file, nextval(), and
SERIAL inserts (segment nextval) all succeed with correct row counts; no
"invalid message length" / protocol-synchronization-lost.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
COPY FROM into any partitioned (or sub-partitioned) table crashed every segment
with a NULL-pointer SIGSEGV in ExecInitPartitionInfo (reached via
CopyFrom -> ExecFindPartition), reported on the QD as "MPP detected N segment
failures, system is reconnected".  This surfaced broadly once COPY data delivery
was fixed (see the pq_getmessage maxlen fix), because pg_regress loads much of
its data via COPY into partitioned tables.

Root cause: GPDB's COPY implementation lives in commands/copy.c (the upstream
commands/copyfrom.c is a stub here).  copy.c builds its ResultRelInfo with
InitResultRelInfo() directly -- it never calls ExecInitResultRelation(), so
estate->es_result_relations stays NULL -- yet it initialized the
ModifyTableState with

    mtstate->resultRelInfo = estate->es_result_relations;   /* NULL */

and never set mt_nrels or rootResultRelInfo.  The PG14 tuple-routing rework made
ExecInitPartitionInfo() dereference mtstate->resultRelInfo[0]
(.ri_RangeTableIndex / .ri_RelationDesc) and mtstate->rootResultRelInfo, so the
NULL resultRelInfo crashed at the first partition routed to.  (copyfrom.c's
CopyFrom sets these three fields correctly; copy.c was not updated in the merge.)

Fix: set mt_nrels = 1 and point both resultRelInfo and rootResultRelInfo at the
COPY target's ResultRelInfo (a valid one-element array), mirroring copyfrom.c.

Verified: COPY FROM stdin into range-partitioned and range-sub-partitioned
tables inserts the correct row counts with no crash.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… "see"'

~46 internal SQL functions (col_description, obj_description, shobj_description,
and many overloaded date/time and system helpers) carry the placeholder body
prosrc => 'see system_functions.sql' in pg_proc.dat; their real bodies live in
src/backend/catalog/system_functions.sql, which initdb is supposed to run after
bootstrap.  The PG merge dropped system_functions.sql from BOTH initdb.c and the
catalog Makefile's install list, so:

  - the file was never installed to $(datadir), and
  - initdb never executed it.

The placeholder bodies therefore survived, and every call to one of these
functions tried to execute the literal text "see system_functions.sql", failing
with:  ERROR: syntax error at or near "see".  Because col_description() is
invoked by psql's \d+, this poisoned a huge fraction of the regression suite
(439 "see" syntax errors plus the bulk of the downstream "current transaction is
aborted" / "relation does not exist" cascade).

Fix:
  - initdb.c: declare system_functions_file, set_input/check_input it, add a
    generic setup_run_file() helper, and run system_functions.sql right after
    setup_auth() and before setup_depend() (so the functions are pinned),
    matching upstream ordering.
  - catalog/Makefile: install (and uninstall) system_functions.sql alongside
    system_views.sql.

Verified: a fresh initdb succeeds; col_description()/obj_description() get real
bodies (PG14 stores them in prosqlbody) and psql \d+ shows column descriptions
with no syntax error.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CREATE VIEW (and any utility statement that dispatches a Query tree, e.g.
CREATE TABLE AS) over a join failed on every segment with:

  ERROR: could not deserialize unrecognized node type: 3  (readfast.c)

PG14 (commit 055fee7) added the JoinExpr.join_using_alias field.  The
merge added it to the text writer (_outJoinExpr in outfuncs.c) and the reader
(_readJoinExpr in readfuncs.c, used for binary too), but NOT to the binary
writer _outJoinExpr in outfast.c, which is the one used for QD->QE dispatch.
So the wire layout written for a JoinExpr was one NODE field short of what the
reader expected; the reader went off the rails and hit a garbage tag (3 =
T_ProjectionInfo, a run-time node that is never serialized) -> deserialize
error on the segment.

Fix: write join_using_alias between usingClause and quals in outfast.c's
_outJoinExpr, matching the reader and the text writer.

Class of bug: a PG14-added node field re-grafted into the shared text/read
funcs but missed in the separate binary writer (outfast.c).  Found by probing
readNodeBinary() to dump the QE-side tag stream, then diffing each outfast.c
binary writer against its reader.

Verified: CREATE VIEW over CROSS/INNER/NATURAL joins and selecting from the
views all succeed, no segment crash.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…G14)

PG14 added Result Cache (later renamed Memoize), gated by enable_resultcache
(default on).  GPDB has not integrated it with the MPP planner/executor: a
generated T_ResultCache plan node is not handled by expression_tree_mutator()
(nodeFuncs.c) and is also absent from the binary plan-dispatch serialization
(outfast.c/readfast.c).  So whenever the planner chose a Result Cache -- e.g.
the information_schema.columns is_updatable computation under
enable_mergejoin/enable_nestloop -- the query failed with:

  ERROR: unrecognized node type: 53   (53 = T_ResultCache)

This hit a broad set of join-heavy tests (updatable_views, returning, rowtypes,
join, indexjoin, subselect, partition_prune, gin, ...).

Until Result Cache is properly supported in MPP, default enable_resultcache to
off so the planner never generates the node.  GPDB's expected outputs never
show a Result Cache node, so this matches them; the standalone resultcache
regression test is not in any schedule, and aggregates.sql already sets it off
explicitly for its one relevant query.

Verified: with the new default, the minimal repro (CREATE VIEW + information_
schema is_updatable under merge/nestloop) and the updatable_views / returning /
rowtypes regression tests no longer raise "unrecognized node type: 53".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…heir C helpers (PG14)

create_function_0 (input/create_function_0.source) defines the C helper
functions binary_coercible(oid,oid), test_enc_conversion(...) and
test_opclass_options_func(internal) from regress.so, which type_sanity,
opr_sanity and conversion depend on.  The merge left create_function_0 out of
parallel_schedule entirely (only create_function_1/2/3 are listed), so those
helpers were never created and the dependent tests failed with
"function binary_coercible(oid, oid) does not exist" /
"function test_enc_conversion(...) does not exist" (~40 hits, plus the
downstream cascade in opr_sanity/type_sanity).

Add create_function_0 to the schedule right after test_setup (before any test
that uses the helpers).  It has its own expected output (create_function_0.out).

Verified (focused schedule test_setup -> create_function_0 -> opr_sanity
conversion): create_function_0 and conversion now pass and the
"function ... does not exist" errors are gone (opr_sanity still differs for
unrelated reasons).  NB create_function_0 also creates trigger functions from
contrib/spi's refint.so/autoinc.so, which must be installed (make -C
contrib/spi install) -- a build/install step, not part of this source change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A correlated subquery in the SELECT targetlist over a distributed table
ran without its correlation filter, returning wrong results:

  SELECT a, (SELECT count(*) FROM t2 WHERE t2.b = t1.a) FROM t1;

returned the *total* count for every row instead of the per-row
correlated count, and a non-aggregate correlated subquery raised
"more than one row returned by a subquery used as an expression".

GPDB routes a correlated subquery's param filter (e.g. t2.b = $0) to the
rel's upperrestrictinfo; bring_to_outer_query() then applies it via a
Result node above the Broadcast Motion, so each segment filters the
broadcast set by its local param. That Result is built by
create_projection_path_with_quals() with the filter in
cdb_restrict_clauses. Two code paths silently dropped the filter:

1. When the subpath is projection-capable (a Motion is), the function
   took the no-Result shortcut (dummypp = true) and never stored
   cdb_restrict_clauses, discarding the filter. Don't take the shortcut
   when there are restrict clauses to apply.

2. When a later projection (the scan/join target) was layered over the
   filter-carrying ProjectionPath, the path-collapsing code stripped the
   inner ProjectionPath and discarded its cdb_restrict_clauses. Carry
   them up into the surviving ProjectionPath instead.

With both fixed, create_projection_plan() emits a Result with
plan->qual = the param filter, and correlated targetlist subqueries
return correct results.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…G14)

A correlated EXISTS in the SELECT targetlist over a distributed table
failed with:

  ERROR: subplan is missing Flow information

  SELECT a, EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a) FROM t1;

For a simple EXISTS, make_subplan() additionally builds a hashed ANY
variant and wraps both in an AlternativeSubPlan, leaving setrefs.c to
choose. GPDB's MPP slice machinery does not support AlternativeSubPlan:
the hashed plan is created without Flow/slice information (so cdbllize
raises "subplan is missing Flow information"), and cdbllize cannot reason
about an AlternativeSubPlan when pruning unused subplans.

In dispatch (MPP) mode, skip building the hashed alternative and keep just
the correlated SubPlan we already built. It is correct on its own -- its
correlation filter is applied above the Motion (see the companion fix in
create_projection_path_with_quals) -- so EXISTS in the targetlist now
returns correct results. The hashed alternative is still considered for
non-MPP (utility-mode) planning.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.