Claude merge 2#2612
Open
dimoffon wants to merge 2213 commits into
Open
Conversation
Commit d75288f made WAL archiver process an auxiliary process. An auxiliary process needs to handle barrier events but the commit forgot to make archiver process do that. Reported-by: Thomas Munro Author: Fujii Masao Reviewed-by: Thomas Munro Discussion: https://postgr.es/m/CA+hUKGLah2w1pWKHonZP_+EQw69=q56AHYwCgEN8GDzsRG_Hgw@mail.gmail.com
The fast default code added in Release 11 omitted to check that the table a fast default was being added to was a plain table. Thus one could be added to a foreign table, which predicably blows up. Here we perform that check. In addition, on the back branches, since some of these might have escaped into the wild, if we encounter a missing value for an attribute of something other than a plain table we ignore it. Fixes bug #17056 Backpatch to release 11, Reviewed by: Andres Freund, Álvaro Herrera and Tom Lane
In the "simple Query" code path, it's fine for parse analysis or execution of a utility statement to scribble on the statement's node tree, since that'll just be thrown away afterwards. However it's not fine if the node tree is in the plan cache, as then it'd be corrupted for subsequent executions. Up to now we've dealt with that by having individual utility-statement functions apply copyObject() if they were going to modify the tree. But that's prone to errors of omission. Bug #17053 from Charles Samborski shows that CREATE/ALTER DOMAIN didn't get this memo, and can crash if executed repeatedly from plan cache. In the back branches, we'll just apply a narrow band-aid for that, but in HEAD it seems prudent to have a more principled fix that will close off the possibility of other similar bugs in future. Hence, let's hoist the responsibility for doing copyObject up into ProcessUtility from its children, thus ensuring that it happens for all utility statement types. Also, modify ProcessUtility's API so that its callers can tell it whether a copy step is necessary. It turns out that in all cases, the immediate caller knows whether the node tree is transient, so this doesn't involve a huge amount of code thrashing. In this way, while we lose a little bit in the execute-from-cache code path due to sometimes copying node trees that wouldn't be mutated anyway, we gain something in the simple-Query code path by not copying throwaway node trees. Statements that are complex enough to be expensive to copy are almost certainly ones that would have to be copied anyway, so the loss in the cache code path shouldn't be much. (Note that this whole problem applies only to utility statements. Optimizable statements don't have the issue because we long ago made the executor treat Plan trees as read-only. Perhaps someday we will make utility statement execution act likewise, but I'm not holding my breath.) Discussion: https://postgr.es/m/931771.1623893989@sss.pgh.pa.us Discussion: https://postgr.es/m/17053-3ca3f501bbc212b4@postgresql.org
Commit 547f04e caused pgbench to start printing its version number, which seems like a fine idea, but it needs a bit more work: * Print the server version number too, when different. * Print the PG_VERSION string, not some reconstructed approximation. This patch copies psql's well-tested code for the same purpose. Discussion: https://postgr.es/m/1226654.1624036821@sss.pgh.pa.us
Ordinarily, a pg_policy.polroles array wouldn't list the same role more than once; but CREATE POLICY does not prevent that. If we perform DROP OWNED BY on a role that is listed more than once, RemoveRoleFromObjectPolicy either suffered an assertion failure or encountered a tuple-updated-by-self error. Rewrite it to cope correctly with duplicate entries, and add a CommandCounterIncrement call to prevent the other problem. Per discussion, there's other cleanup that ought to happen here, but this seems like the minimum essential fix. Per bug #17062 from Alexander Lakhin. It's been broken all along, so back-patch to all supported branches. Discussion: https://postgr.es/m/17062-11f471ae3199ca23@postgresql.org
The code to signal a running walsender when its reserved WAL size grows too large is completely uncovered before this commit; this adds coverage for that case. This test involves sending SIGSTOP to walsender and walreceiver and running a checkpoint while advancing WAL, then sending SIGCONT. There's no precedent for this coding in Perl tests, and my reading of relevant manpages says it's likely to fail on Windows. Because of this, this test is always skipped on that platform. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/202106102202.mjw4huiix7lo@alvherre.pgsql
Generalize the INDEX_CLEANUP VACUUM parameter (and the corresponding reloption): make it into a ternary style boolean parameter. It now exposes a third option, "auto". The "auto" option (which is now the default) enables the "bypass index vacuuming" optimization added by commit 1e55e7d. "VACUUM (INDEX_CLEANUP TRUE)" is redefined to once again make VACUUM simply do any required index vacuuming, regardless of how few dead tuples are encountered during the first scan of the target heap relation (unless there are exactly zero). This gives users a way of opting out of the "bypass index vacuuming" optimization, if for whatever reason that proves necessary. It is also expected to be used by PostgreSQL developers as a testing option from time to time. "VACUUM (INDEX_CLEANUP FALSE)" does the same thing as it always has: it forcibly disables both index vacuuming and index cleanup. It's not expected to be used much in PostgreSQL 14. The failsafe mechanism added by commit 1e55e7d addresses the same problem in a simpler way. INDEX_CLEANUP can now be thought of as a testing and compatibility option. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/CAH2-WznrBoCST4_Gxh_G9hA8NzGUbeBGnOUC8FcXcrhqsv6OHQ@mail.gmail.com
Commit e7eea52 has introduced a new function RelationGetIdentityKeyBitmap which omits to handle the case where there is no replica identity index on a relation. Author: Mark Dilger Reviewed-by: Takamichi Osumi, Amit Kapila Discussion: https://www.postgresql.org/message-id/4C99A862-69C8-431F-960A-81B1151F1B89@enterprisedb.com
We had a request to provide a way to test at compile time for the availability of the new pipeline features. More generally, it seems like a good idea to provide a way to test via #ifdef for all new libpq API features. People have been using the version from pg_config.h for that; but that's more likely to represent the server version than the libpq version, in the increasingly-common scenario where they're different. It's safer if libpq-fe.h itself is the source of truth about what features it offers. Hence, establish a policy that starting in v14 we'll add a suitable feature-is-present macro to libpq-fe.h when we add new API there. (There doesn't seem to be much point in applying this policy retroactively, but it's not too late for v14.) Tom Lane and Alvaro Herrera, per suggestion from Boris Kolpackov. Discussion: https://postgr.es/m/boris.20210617102439@codesynthesis.com
Buildfarm members ayu and tern have sometimes shown a different plan than expected for this query. I'd been unable to reproduce that before today, but I finally realized what is happening. If there is a concurrent open transaction (probably an autovacuum run in the buildfarm, but this can also be arranged manually), then the index entries for the rows removed by the DELETE a few lines up are not killed promptly, causing a change in the planner's estimate of the extremal value of ft2.c1, which moves the rowcount estimate for "c1 > 1100" by enough to change the join plan from nestloop to hash. To fix, change the query condition to "c1 > 1000", causing the hash plan to be preferred whether or not a concurrent open transaction exists. Since this UPDATE is tailored to be a no-op, nothing else changes. Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=ayu&dt=2021-06-09%2022%3A45%3A48 Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=ayu&dt=2021-06-13%2022%3A38%3A18 Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2021-06-20%2004%3A55%3A36
This reverts commit 0912698; the test case added there failed once in circumstances that remain mysterious. It seems better to remove the test for now so that 14beta2 doesn't have random failures built in.
The failsafe can trigger when index processing is already disabled. This can happen when VACUUM's INDEX_CLEANUP parameter is "off" and the failsafe happens to trigger. Remove assertions that assume that index processing is directly tied to the failsafe. Oversight in commit c242baa, which made it possible for the failsafe to trigger in a two-pass strategy VACUUM that has yet to make its first call to lazy_vacuum_all_indexes().
Reported-by: Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoCTHyouoGv-xt1qNjjvPbGMErLi0AJncByTvr66Nq7j8g@mail.gmail.com
Code comments were claiming that verify_heapam() was checking privileges on the relation it was operating on, but it didn't actually do that. Perhaps earlier versions of the patch did that, but now the access is regulated by privileges on the function. Remove the wrong comments.
Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 70796ae860c444c764bb591c885f22cac1c168ec
linitial_node() fails in assert enabled builds if the given pointer is not of the specified type. Here the type is IntList. The code thought it should be expecting List, but it was wrong. In the existing tests which run this code the initial list element is always NIL. Since linitial_node() allows NULL, we didn't trigger any assert failures in the existing regression tests. There is still some discussion as to whether we need a few more tests in this area, but for now, since beta2 is looming, fix the bug first. Bug: #17067 Discussion: https://postgr.es/m/17067-665d50fa321f79e0@postgresql.org Reported-by: Yaoguang Chen
In dc7420c I (Andres) accidentally used RelationIsAccessibleInLogicalDecoding() as the sole condition to use the non-shared catalog horizon in GetOldestNonRemovableTransactionId(). That is incorrect, as RelationIsAccessibleInLogicalDecoding() checks whether wal_level is logical. The correct check, as done e.g. in GlobalVisTestFor(), is to check IsCatalogRelation() and RelationIsAccessibleInLogicalDecoding(). The observed misbehavior of this bug was that there could be an endless loop in lazy_scan_prune(), because the horizons used in heap_page_prune() and the individual tuple liveliness checks did not match. Likely there are other potential consequences as well. A later commit will unify the determination which horizon has to be used, and add additional assertions to make it easier to catch a bug like this. Reported-By: Justin Pryzby <pryzby@telsasoft.com> Diagnosed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAEze2Wg32Y9+WJfw=aofkRx1ZRFt_Ev6bNPc4PSaz7PjSFtZgQ@mail.gmail.com
Add a .git-blame-ignore-revs file with a list of pgindent, pgperlyidy, and reformat-dat-files commit hashes. Postgres hackers that configure git to use the ignore file will get git-blame output that avoids attributing line changes to the ignored indent commits. This makes git-blame output much easier to work with in practice. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAH2-Wz=cVh3GHTP6SdLU-Gnmt2zRdF8vZkcrFdSzXQ=WhbWm9Q@mail.gmail.com
Commits 84f5c29 et al missed the need to cover plpgsql's "simple expression" code path. If the first thing we execute after a COMMIT/ROLLBACK is one of those, rather than a full-fledged SPI command, we must explicitly do EnsurePortalSnapshotExists() to make sure we have an outer snapshot. Note that it wouldn't be good enough to just push a snapshot for the duration of the expression execution: what comes back might be toasted, so we'd better have a snapshot protecting it. The test case demonstrating this fact cheats a bit by marking a SQL function immutable even though it fetches from a table. That's nothing that users haven't been seen to do, though. Per report from Jim Nasby. Back-patch to v11, like the previous fix. Discussion: https://postgr.es/m/378885e4-f85f-fc28-6c91-c4d1c080bf26@amazon.com
We've long contended with isolation test results that aren't entirely stable. Some test scripts insert long delays to try to force stable results, which is not terribly desirable; but other erratic failure modes remain, causing unrepeatable buildfarm failures. I've spent a fair amount of time trying to solve this by improving the server-side support code, without much success: that way is fundamentally unable to cope with diffs that stem from chance ordering of arrival of messages from different server processes. We can improve matters on the client side, however, by annotating the test scripts themselves to show the desired reporting order of events that might occur in different orders. This patch adds three types of annotations to deal with (a) test steps that might or might not complete their waits before the isolationtester can see them waiting; (b) test steps in different sessions that can legitimately complete in either order; and (c) NOTIFY messages that might arrive before or after the completion of a step in another session. We might need more annotation types later, but this seems to be enough to deal with the instabilities we've seen in the buildfarm. It also lets us get rid of all the long delays that were previously used, cutting more than a minute off the runtime of the isolation tests. Back-patch to all supported branches, because the buildfarm instabilities affect all the branches, and because it seems desirable to keep isolationtester's capabilities the same across all branches to simplify possible future back-patching of tests. Discussion: https://postgr.es/m/327948.1623725828@sss.pgh.pa.us
The code to signal a running walsender when its reserved WAL size grows too large is completely uncovered before this commit; this adds coverage for that case. This test involves sending SIGSTOP to walsender and walreceiver, then advancing enough WAL for a checkpoint to trigger, then sending SIGCONT. There's no precedent for STOP signalling in Perl tests, and my reading of relevant manpages says it's likely to fail on Windows. Because of this, this test is always skipped on that platform. This version fixes a couple of rarely hit race conditions in the previous attempt 0912698; most notably, both LOG string searches are loops, not just the second one; we acquire the start-of-log position before STOP-signalling; and reference the correct process name in the test description. All per Tom Lane. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/202106102202.mjw4huiix7lo@alvherre.pgsql
Previously, isolationtester displayed SQL query results using some ad-hoc code that clearly hadn't had much effort expended on it. Field values longer than 14 characters weren't separated from the next field, and usually caused misalignment of the columns too. Also there was no visual separation of a query's result from subsequent isolationtester output. This made test result files confusing and hard to read. To improve matters, let's use libpq's PQprint() function. Although that's long since unused by psql, it's still plenty good enough for the purpose here. Like 741d7f1, back-patch to all supported branches, so that this isn't a stumbling block for back-patching isolation test changes. Discussion: https://postgr.es/m/582362.1623798221@sss.pgh.pa.us
Our uses of gss_display_status() and gss_display_name() assumed that the gss_buffer_desc strings returned by those functions are null-terminated. It appears that they generally are, given the lack of field complaints up to now. However, the available documentation does not promise this, and some man pages for gss_display_status() show examples that rely on the gss_buffer_desc.length field instead of expecting null termination. Also, we now have a report that on some implementations, clang's address sanitizer is of the opinion that the byte after the specified length is undefined. Hence, change the code to rely on the length field instead. This might well be cosmetic rather than fixing any real bug, but it's hard to be sure, so back-patch to all supported branches. While here, also back-patch the v12 changes that made pg_GSS_error deal honestly with multiple messages available from gss_display_status. Per report from Sudheer H R. Discussion: https://postgr.es/m/5372B6D4-8276-42C0-B8FB-BD0918826FC3@tekenlight.com
The syntax summaries for CREATE FUNCTION and allied commands made it look like LEAKPROOF is an alternative to IMMUTABLE/STABLE/VOLATILE, when of course it is an orthogonal option. Improve that. Per gripe from aazamrafeeque0. Thanks to David Johnston for suggestions. Discussion: https://postgr.es/m/162444349581.694.5818572718530259025@wrigleys.postgresql.org
For no obvious reason, isolationtester has always insisted that session and step names be written with double quotes. This is fairly tedious and does little for test readability, especially since the names that people actually choose almost always look like normal identifiers. Hence, let's tweak the lexer to allow SQL-like identifiers not only double-quoted strings. (They're SQL-like, not exactly SQL, because I didn't add any case-folding logic. Also there's no provision for U&"..." names, not that anyone's likely to care.) There is one incompatibility introduced by this change: if you write "foo""bar" with no space, that used to be taken as two identifiers, but now it's just one identifier with an embedded quote mark. I converted all the src/test/isolation/ specfiles to remove unnecessary double quotes, but stopped there because my eyes were glazing over already. Like 741d7f1, back-patch to all supported branches, so that this isn't a stumbling block for back-patching isolation test changes. Discussion: https://postgr.es/m/759113.1623861959@sss.pgh.pa.us
Reported-by: Simon Riggs Author: Takamichi Osumi Reviewed-by: Amit Kapila Backpatch-through: 9.6 Discussion: https://www.postgresql.org/message-id/20210222222847.tpnb6eg3yiykzpky@alap3.anarazel.de
Commits 9de77b5 and ac4645c missed to update the logical replication message formats section in the docs. Author: Brar Piening Reviewed-by: Amit Kapila Discussion: https://www.postgresql.org/message-id/cc70956c-e578-e54f-49e6-b5d68c89576f@gmx.de
The checkpointer crashed (NULL function-pointer call) in RememberSyncRequest and SyncPostCheckpoint whenever an append-optimized relation's segment files were forgotten/unlinked. GPDB added SYNC_HANDLER_AO to the SyncRequestHandler enum and aomd.c registers SYNC_FORGET_REQUEST/SYNC_UNLINK_REQUEST with it, but the syncsw[] dispatch table in sync.c had no entry for SYNC_HANDLER_AO, so syncsw[SYNC_HANDLER_AO].sync_* were all NULL. Because every dropped AO table schedules such requests, the checkpointer crash-looped: the postmaster restarted it, it crashed on the next checkpoint, and each crash dumped a multi-hundred-MB core -- enough to fill the disk and abort the regression run. Register the SYNC_HANDLER_AO entry. aosyncfiletag() already exists in md.c; the deferred unlink only targets the base segfile (segno 0) and the filter matches by database, so mdunlinkfiletag()/mdfiletagmatches() are correct for AO as well. Verified: creating, populating and dropping AO and AOCS tables followed by CHECKPOINT no longer crashes the checkpointer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…odifyTable INSERT (and COPY) into an append-optimized table reported success but stored zero rows, and each segment logged "Snapshot reference leak ... still referenced". ExecInitModifyTable() calls table_dml_init() for every result relation, which for AO/AOCS sets up a cached insert descriptor (registering a metadata snapshot). The matching table_dml_finish() -- which flushes the segment file, writes the pg_aoseg/pg_aocsseg row counts and unregisters the snapshot -- was never called: the PG14 merge adopted upstream's ExecEndModifyTable() loop, which has no such hook, and the GPDB call was dropped. With the descriptor never finished, pg_aoseg.tupcount stayed 0 (rows invisible) and the snapshot leaked. Call table_dml_finish() for each result relation in ExecEndModifyTable(), mirroring the table_dml_init() in ExecInitModifyTable(). All AMs (heap, AO, AOCS) provide dml_finish, so the call is safe for every relation. Verified: AO and AOCS INSERT now persist all rows (pg_aoseg.tupcount matches), with no snapshot-leak warnings; DELETE on AO continues to work. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Querying pg_locks crashed the segments (SIGSEGV in heap_form_tuple -> heap_compute_data_size -> strlen) whenever pg_lock_status() ran there -- which the coordinator does for every pg_locks query (CdbDispatchCommand "SELECT * FROM pg_lock_status()"). The DTX regression tests query pg_locks, so the crash cascaded into a full-cluster reset + DTX recovery storm that aborted the run. Two bugs: 1. lockfuncs.c built a malformed tuple descriptor: the GPDB columns were all added at attnum 19 (mppSessionId/mppIsWriter/gp_segment_id), leaving attnums 17 and 18 uninitialized (atttypid 0, attlen 0). att_addlength_pointer() treats attlen 0 as a cstring and runs strlen() on the (byval) Datum -> crash. The value array had the same typo: values[17] assigned twice (the second overwriting mppIsWriter with segindex) and values[18] never set. Fixed the TupleDescInitEntry attnums to 17/18/19 and the value indices to 16/17/18; the segment-result and predicate-lock loops had the same off-by-one (waitstart not accounted for) and are corrected too. 2. A "kept both sides" merge left pg_proc.dat with two proallargtypes/ proargmodes/proargnames sets for pg_lock_status -- the GPDB one (18 cols, the 3 MPP columns, no waitstart) and upstream's (16 cols, with waitstart). The last wins in the Perl hash, so the catalog declared only 16 columns while the C code returns 19, giving "Returned row contains 19 attributes, but query expects 16". Merged them into the correct 19-column union (15 base + waitstart + mppSessionId + mppIsWriter + gp_segment_id). Verified: pg_locks now has 19 columns and returns segment lock rows (with gp_segment_id) with no crash. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The PG14 merge adopted upstream's new row-identity machinery
(add_row_identity_columns / add_row_identity_var / ROWID_VAR), which
emits only the "ctid" junk column for the UPDATE/DELETE result relation.
GPDB additionally needs a "gp_segment_id" junk column so the Explicit
Motion can route each modified/deleted row back to the segment that owns
it (cdbpathtoplan_create_motion_plan -> find_junk_tle("gp_segment_id")),
and so SplitUpdate can read it from its input tlist
(nodeSplitUpdate.c). The merge dropped GPDB's gp_segment_id when taking
the upstream shape, so every distributed UPDATE/DELETE that needed an
Explicit Motion failed with "could not find gp_segment_id in subplan's
targetlist" (444 occurrences in the regression baseline).
Re-graft gp_segment_id into add_row_identity_columns alongside ctid. It
rides the same ROWID_VAR sharing path, so the single-table, randomly-
distributed, replicated and inherited/partitioned cases all get the
column with correct per-leaf translation.
Validated: single-table hash/random/replicated UPDATE and DELETE now
succeed. (Partitioned UPDATE and distribution-key UPDATE expose two
further, independent PG14-merge bugs -- is-split-update list length and
ExecBuildUpdateProjection targetColnos -- tracked separately.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…pdate information" PG14 removed inheritance_planner and now plans inherited/partitioned UPDATE/DELETE inside grouping_planner, building per-leaf resultRelations / updateColnosLists / withCheckOptionLists / returningLists in a loop. GPDB's is_split_updates list, however, was passed to create_modifytable_path() as a hard-coded list_make1_int(root->is_split_update) -- a single element regardless of how many leaf result relations the loop produced. For a partitioned/inherited UPDATE with N leaf relations this yields resultRelations of length N but isSplitUpdates of length 1, so ExecInitModifyTable() failed with "ModifyTable node is missing is-split-update information" (list_length(isSplitUpdates) != nrels); 80 occurrences in the regression baseline. (The plan-time length Assert in make_modifytable() is compiled out in this non-cassert build, so the mismatch only surfaced at execution.) Build is_split_updates in lockstep with resultRelations across all three branches (multi-relation leaf loop, the all-excluded dummy fallback, and the single-relation case). All leaf partitions share the parent's distribution policy, so the split-update decision is uniform and each entry is root->is_split_update. Validated: partitioned non-distribution-key UPDATE and DELETE now succeed. (Distribution-key UPDATE still hits the separate ExecBuildUpdateProjection targetColnos bug, tracked next.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A distribution-key UPDATE is executed in GPDB as a Split Update: the modified tuple may belong on a different segment, so each input row is turned into a DELETE of the old tuple plus an INSERT of the new one, re-routed by an Explicit Motion. The PG14 merge left this path broken at several layers; this fixes all of them so single-table, partitioned, random, replicated and join-driven distribution-key UPDATEs produce correct results (and correct command tags). preptlist.c: * A Split Update needs the full new tuple, so the UPDATE targetlist is expanded to every column. update_colnos was extracted from the *partial* (SET-only) tlist before that expansion, so it had fewer entries than the expanded tlist -- ExecBuildUpdateProjection() then rejected the plan with "targetColnos does not match subplan target list". Expand first, then take update_colnos from the expanded tlist. * expand_targetlist() filled columns missing from the UPDATE tlist with a NULL Const (correct only for INSERT). For a Split Update the unmodified columns must keep their old value, so emit a Var referring to the target relation's attribute instead; otherwise the re-inserted tuple had its unmodified columns blanked to NULL (e.g. a partition key became NULL -> "no partition of relation found for row"). nodeModifyTable.c: * ExecInitModifyTable() never set ri_action_attno / ri_segid_attno (only execMain.c initialised them to InvalidAttrNumber), so the "DMLAction" and "gp_segment_id" junk columns were never located. Look them up from the subplan targetlist for UPDATE/DELETE. * ExecModifyTable()'s CMD_UPDATE case always called ExecUpdate(), even for Split Update rows -- the DMLActionExpr value was read but ignored, and the dead helper ExecSplitUpdate_Insert() was never called, so delete/insert action rows were wrongly applied as in-place updates and corrupted data. Dispatch on the action: DML_INSERT goes through ExecSplitUpdate_Insert(), DML_DELETE through ExecDelete(splitUpdate). The DELETE side runs with canSetTag=false so each updated row is counted once. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
get_agg_clause_costs() (moved to prepagg.c in the PG14 merge) stopped populating AggClauseCosts.hasNonCombine / hasNonSerial. Those flags gate GPDB's multi-stage (Motion-separated) aggregation: planner.c only sets try_mpp_multistage_aggregation when both are false, and cdb_create_multistage_grouping_paths() Assert()s on them. Because they were left at zero, the planner happily built a two-stage plan for an aggregate that has no combine function (or, for an INTERNAL transition state, no serialize/deserialize function) -- e.g. string_agg, json_agg, jsonb_agg. Costing the combine stage then called add_function_cost() on the invalid (0) combine OID and failed during planning with "cache lookup failed for function 0" (171 occurrences in the regression baseline; the Assert is compiled out in this non-cassert build). Re-populate hasNonCombine / hasNonSerial in get_agg_clause_costs(), mirroring the root->hasNonPartialAggs / root->hasNonSerialAggs logic, so such aggregates fall back to single-stage aggregation. Validated: string_agg (plain and ORDER BY), json_agg now work; combinable aggregates (count(DISTINCT), sum) are unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… index_cleanup/truncate) vacuum_params_to_options_list() converts a VacuumParams bitmask back into a DefElem list to dispatch to the segments. Two PG14 changes were not accounted for, so it rejected ordinary VACUUMs: * PG14 added VACOPT_PROCESS_TOAST (0x40), set by default for VACUUM. The bit-walking loop didn't clear it, so the trailing "if (optmask != 0)" guard fired with "unrecognized vacuum option 40" (149 occurrences in the regression baseline). Always emit a "process_toast" DefElem with its value -- the segment defaults it to on, so it must be dispatched explicitly to honour "(PROCESS_TOAST off)". * PG14 made index_cleanup and truncate tri-state (UNSPECIFIED / AUTO / DISABLED / ENABLED) instead of boolean. The code only handled DISABLED/ENABLED and elog(ERROR)'d otherwise, so a plain VACUUM (which leaves index_cleanup = AUTO) failed with "unexpected VACUUM 'index_cleanup' option '1'". Only dispatch an explicit DISABLED/ENABLED; for UNSPECIFIED/AUTO emit nothing and let each segment apply its own (matching) default. Validated: VACUUM, VACUUM FULL/FREEZE/ANALYZE, VACUUM (PROCESS_TOAST off), (INDEX_CLEANUP auto|off), (TRUNCATE off), (VERBOSE, ANALYZE) all succeed and process TOAST tables on the segments. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PG14 changed UPDATE to carry only the SET columns in the plan tuple and to reconstruct the full new tuple in the executor by fetching the old tuple (table_tuple_fetch_row_version) and merging the unmodified columns from it. Append-optimized tables can't fetch a tuple by TID -- appendonly_fetch_row_version / aoco_fetch_row_version are stubs that raise "feature not supported on appendoptimized relations" -- so every UPDATE of an AO/AOCS table failed (86 occurrences in the regression baseline). AO/AOCS UPDATE is implemented as delete + re-insert by the table AM (appendonly_tuple_update), which needs the full new tuple and the old TID, but not the old tuple's contents. So: * preprocess_targetlist(): expand the UPDATE targetlist to every column for AO/AOCS relations (as is already done for a Split Update), and take update_colnos from the expanded tlist. The plan then produces the complete new tuple, so no old-tuple merge is required. * ExecModifyTable(): for an AO/AOCS result relation, skip the table_tuple_fetch_row_version() old-tuple fetch and leave the old slot empty. With a full targetlist ExecBuildUpdateProjection() doesn't read the old slot, and the AM's tuple_update gets the full new tuple plus the row's TID. Validated: AO and AOCS UPDATE (distribution-key and not) and DELETE all succeed with correct values and unmodified columns preserved. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ist" A global aggregate with an in-aggregate ORDER BY (e.g. array_agg(x ORDER BY x), string_agg(...ORDER BY...)) failed during planning with "ORDER/GROUP BY expression not found in targetlist" (33 occurrences in the regression baseline). Two PG14-merge regressions combined to cause it: * create_motion_plan() was the only create_*_plan that didn't accept the createplan flags, so CP_LABEL_TLIST was dropped at every Motion. A Sort/Agg that needs ressortgroupref labels on its input then could not find its ORDER BY column in a Gather Motion's targetlist (make_motion shares the child's tlist, so labeling the child suffices). Thread flags through create_motion_plan() and pass CP_LABEL_TLIST down to the subplan recursion. This fixes single-stage ordered aggregates such as string_agg (which can't be partially aggregated). * AggClauseCosts.numPureOrderedAggs (and numOrderedAggs) were never populated by get_agg_clause_costs(). numPureOrderedAggs counts aggregates with an explicit ORDER BY / WITHIN GROUP, whose result depends on the global input order and therefore cannot be split across a Motion. Left at 0, the MPP grouping planner's has_ordered_aggs gate never tripped, so a combine-capable ordered aggregate like array_agg was routed into a multi-stage plan and failed. Populate both counts in get_agg_clause_costs(); DISTINCT-only aggregates remain multi-stageable via the DQA path. Validated: array_agg/string_agg/json_agg with ORDER BY (global and grouped) return correct results; array_agg(DISTINCT) still multi-stages; plain aggregates and ordered-set aggregates (percentile_cont) unaffected. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
GPDB allows a window frame offset to reference columns of the input (e.g. "RANGE floor(qty) PRECEDING"), unlike upstream PostgreSQL which requires a constant offset. The PG14 merge regressed set_plan_refs(): the T_WindowAgg case fixes the frame edges with fix_scan_expr(), even though the in-tree comment still says GPDB must use fix_upper_expr() (and an indexed_tlist variable was declared for that purpose but left unused). fix_scan_expr() leaves a column-valued offset's Vars with base-relation varnos instead of OUTER_VAR. At execution compute_start_end_offsets() sets ecxt_outertuple to the scan slot and evaluates the offset, but the mis-numbered Var reads the wrong/empty slot, so the WindowAgg segfaults (compute_start_end_offsets -> ExecInterpExprStillValid -> SIGSEGV). In the qp_olap_window regression test this crashed all segments on ~50 queries; the crash/recovery/retry loop is what looked like a "hang" in the baseline run and filled the disk with hundreds of core files. Restore the documented behaviour: build the subplan's indexed tlist and fix the start/end offsets with fix_upper_expr(..., OUTER_VAR, ...), so a column-valued offset's Vars correctly reference the WindowAgg input. Validated: the previously-crashing queries now return rows, and the full qp_olap_window test runs to completion with 0 segment crashes (was 592 core dumps) and output matching the expected answer file; the only remaining errors are the test's intentional negative cases (multi-column RANGE, negative offsets, division by zero). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
get_agg_clause_costs() (moved to prepagg.c in the PG14 merge) stopped populating AggClauseCosts.distinctAggrefs, the list of Aggrefs carrying aggdistinct. The MPP grouping planner gates its specialized DQA paths on that list (cdb_create_multistage_grouping_paths -> recognize_dqa_type, plus several can_hash checks in cdbgroupingpaths.c). With the list always empty, a query like "count(distinct d) group by i" never built the DQA plan that dedups the distinct argument by adding it to the first-stage group key. Instead it fell through to the generic two-stage partial aggregation, which cannot partialize a DISTINCT aggregate: the Partial HashAggregate tried to finalize the per-group distinct sort and crashed all segments in finalize_aggregates() -> tuplesort_performsort(). Re-populate distinctAggrefs in the agginfos loop alongside the existing numOrderedAggs counting. The first-stage group key is now (i, d) as expected (matches expected/gp_dqa.out), and the single-DQA queries run without crashing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
VARSIZE_TO_SHORT() hardcoded the big-endian short-varlena header form (length | 0x80), but everything else in the tree uses the standard PostgreSQL varlena macros, which on a little-endian build encode a 1-byte header as (length << 1) | 0x01 (see SET_VARSIZE_1B / VARATT_IS_SHORT / VARSIZE_1B). So converting a 4-byte varlena to short produced a header like 0x88 for an 8-byte datum, which VARATT_IS_SHORT() does not recognize (0x88 is even) and VARSIZE_ANY() misreads as a 4-byte header with a huge length. This corrupted any value written through the 4-byte -> short conversion path: - AOCS column store (datumstreamblock.c DatumStreamBlockWrite_Put*), - memtuples (memtuple.c). A normal INSERT often dodged it because the value arrived already in standard short form (0x11, copied verbatim), but an UPDATE's projected value arrived as a 4-byte varlena (0x2c) and hit the conversion path, storing 0x88. The next scan of that AOCS segfile then fed the bogus header to heap_form_minimal_tuple()/the Motion serializer, which computed a multi-KB/-MB length and SIGSEGV'd every segment (aocs_getnext -> ... -> memmove). Reproduces via vacuum_gp's aocs_empty, alter_table_aocs's addcol*, and decode_expr's appendonly decodecharao2. Make VARSIZE_TO_SHORT endianness-aware, matching SET_VARSIZE_1B. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PG14 changed reindex_index()'s final argument from an int options bitmask to a ReindexParams pointer, which it dereferences immediately (params->options at index.c:3606). bmbulkdelete() -- the bitmap AM's ambulkdelete, which rebuilds the whole index because bitmap indexes have no incremental delete -- still passed a literal 0, so reindex_index() dereferenced a NULL ReindexParams and SIGSEGV'd whenever VACUUM processed a bitmap index (e.g. "vacuum bm_test", bitmap_index regression test). Pass an empty ReindexParams struct instead. All other reindex_index() callers already pass a proper pointer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…d not open file t_<relfilenode>") GPDB stores temp tables in *shared* buffers, not local ones: ReadBuffer_common hardcodes isLocalBuf=false (SmgrIsTemp is commented out, bufmgr.c). So when a temp relation's storage is unlinked, its dirty pages must be dropped from the shared buffer pool like any other relation. DropRelFileNodeBuffers honors this -- it #if 0's out the upstream "RelFileNodeBackendIsTemp -> drop from local buffers" short-circuit, with the comment "Temp tables use shared buffers in Greenplum". The pre-merge GPDB DropRelFileNodesAllBuffers had the same #if 0. The PG14 buffer-manager rework (SMgrRelation-based Drop*Buffers + smgrnblocks caching) kept the comment but dropped the #if 0, re-activating the upstream short-circuit. Temp relations with backend != InvalidBackendId (TempRelBackendId, the "temp but not ours" case in RelationInitPhysicalAddr) are now routed to DropRelFileNodeAllLocalBuffers (dead code -- GPDB never uses local buffers) and, critically, excluded from the shared-buffer drop list. Their dirty shared buffers are leaked. After the temp file is unlinked (deferred to the next checkpoint via mdunlink), the leaked dirty buffers reference a file that no longer exists. The next bgwriter/checkpointer sweep or buffer eviction tries to flush them and PANICs: ERROR: could not open file "base/<db>/t_<relfilenode>": No such file or directory ... writing block N of relation base/<db>/t_<relfilenode> (md.c) In a busy run this repeats thousands of times per minute and takes segments down, cascading into widespread failures. Fix: restore the #if 0 around the temp short-circuit in DropRelFileNodesAllBuffers, matching DropRelFileNodeBuffers and the pre-merge code, so temp relations are dropped from shared buffers like everything else. Verified with a controlled before/after on the same workload (the standard 'temp' regression test + CHECKPOINT + heavy eviction): a leak-detection probe confirmed temp rels reach the function with backend=-2 (TempRelBackendId) and dirty shared buffers; the buggy build produced 156k+ "could not open file t_<relfilenode>" errors plus a crash, while the fixed build produced zero errors and no cores. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…elete_tuples) PG14 added table-AM-assisted index tuple deletion to nbtree: both the simple LP_DEAD pass (_bt_simpledel_pass) and the bottom-up deletion pass (_bt_bottomupdel_pass) call _bt_delitems_delete_check() -> table_index_delete_tuples(), which dispatches to rel->rd_tableam->index_delete_tuples() with no NULL check (tableam.h). The heap AM provides heap_index_delete_tuples, but GPDB's append-optimized AMs leave the callback NULL (appendonlyam_handler.c, aocsam_handler.c) -- AO/AOCS reclaim dead tuples via the visimap + VACUUM, not heap-style TID deletion. So a btree index on an AO/AOCS table that triggers either deletion pass jumps through the NULL pointer and SIGSEGVs: arenadata#6 0x0 (NULL) arenadata#7 _bt_delitems_delete_check arenadata#8 _bt_bottomupdel_pass arenadata#9 _bt_delete_or_dedup_one_page arenadata#10 _bt_doinsert Both passes are optional space-reclamation optimizations performed to avoid a leaf-page split; deduplication (the third strategy) does not use the table AM. Fix: in _bt_delete_or_dedup_one_page() skip both deletion passes when the table AM does not implement index_delete_tuples, falling back to deduplication / a page split. This restores the pre-PG14 behavior for AO/AOCS indexes (no incremental index deletion) and leaves heap behavior unchanged. Verified before/after with AO and AOCS tables whose btree index keys all share one value (so entries pile on one leaf-page chain) churned by repeated full-table UPDATEs of a non-indexed column: pre-fix produced 18 SIGSEGV cores with the backtrace above and took segments down; post-fix both tables complete with correct row counts, zero cores, all segments up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tartup_dummy) EXPLAIN ANALYZE of a CREATE TABLE AS / SELECT INTO that uses EXECUTE of a prepared statement crashed the coordinator: arenadata#6 intorel_startup_dummy (rel == NULL) arenadata#7 standard_ExecutorRun arenadata#8 ExplainOnePlan arenadata#9 ExplainExecuteQuery arenadata#10 ExplainOneUtility GPDB creates the CTAS target relation in intorel_initplan() (called from InitPlan, execMain.c) gated on PlannedStmt->intoClause, and leaves the DestReceiver's rStartup a near-dummy that dereferences the relation created there. The freshly-planned EXPLAIN path (ExplainOneQuery) copies the IntoClause onto the plan before running it, but ExplainExecuteQuery passed the cached plan to ExplainOnePlan without setting PlannedStmt->intoClause. intorel_initplan was therefore skipped, the DR_intorel's rel stayed NULL, and intorel_startup_dummy dereferenced it during ExecutorRun -> coordinator SIGSEGV (which drops every session in the run). Fix: in ExplainExecuteQuery set the IntoClause on the plan when into != NULL, mirroring ExplainOneQuery. The PlannedStmt belongs to the shared cached plan, so set it on a copy -- otherwise a stale IntoClause would make a later plain EXECUTE of the same statement try to create a table. Verified: pre-fix, EXPLAIN ANALYZE CREATE TABLE t AS EXECUTE p dumps a core with the backtrace above; post-fix it no longer crashes. (CREATE TABLE AS ... EXECUTE then hits a separate, pre-existing OID-dispatch error -- "oids were assigned, but not dispatched to QEs" -- that also affects plain CREATE TABLE AS EXECUTE and is out of scope for this crash fix.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…("oids were assigned, but not dispatched to QEs")
All forms of CREATE TABLE AS / SELECT INTO ... EXECUTE (plain, temp, and under
EXPLAIN ANALYZE) failed on the coordinator with:
WARNING: OID assignment not dispatched: catalog 1259 ... name "x"
ERROR: oids were assigned, but not dispatched to QEs
GPDB plans a CTAS specially: the IntoClause must reach the planner so the query
is marked PARENTSTMTTYPE_CTAS and a *distributed* plan is built -- rows are
redistributed to the target table's segments, which create and populate the
relation (consuming the OIDs the QD pre-assigned). This is threaded through the
plancache via an extra IntoClause argument to GetCachedPlan() ->
RevalidateCachedQuery()/choose_custom_plan()/BuildCachedPlan() (MPP-8135).
The PG14 plancache merge adopted upstream's GetCachedPlan() signature and turned
the GPDB intoClause argument into a hardcoded local `IntoClause *intoClause =
NULL;`, even though the function's own header comment still documents the extra
parameter and the rest of plancache.c still threads it. With it forced NULL the
EXECUTE plan was the plain gather-to-coordinator SELECT: the QD created the
table and assigned OIDs, the segments never did, and the un-dispatched OIDs
tripped the end-of-xact check in AtEOXact_DispatchOids().
Fix: restore the IntoClause parameter to GetCachedPlan() (header + definition,
dropping the dead NULL local) and pass it from the two CTAS callers,
ExecuteQuery() and ExplainExecuteQuery(); the remaining non-CTAS callers (spi.c,
postgres.c) pass NULL as before.
Verified: plain, parameterized, WITH NO DATA, and EXPLAIN ANALYZE CREATE TABLE
AS EXECUTE all succeed with the rows correctly distributed across segments
(nsegs=3); a plain EXECUTE returning rows is unaffected and the cached plan is
not corrupted. Together with the prior intorel_startup_dummy fix, CTAS-EXECUTE
works end to end.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
REINDEX CONCURRENTLY corrupts the catalog on a distributed cluster. Upstream's
concurrent reindex builds a fresh index (a new "_ccnew" pg_class entry, new OID),
swaps it in under the original name, and drops the old one -- i.e. it
intentionally *changes the index OID*. In GPDB only the coordinator runs the
concurrent path; the segments reindex non-concurrently and keep the original OID.
The result is an index whose OID differs between the QD and the segments
("could not open relation with OID N (segX)" on the next index scan), plus the
transient _ccnew OIDs are never dispatched ("oids were assigned, but not
dispatched to QEs").
Properly supporting concurrent reindex in MPP requires the segments to rebuild
under the QD's new OID, which collides with PreventInTransactionBlock() and the
fact that a QE cannot run a multi-transaction concurrent reindex from within a
dispatched statement -- a larger feature effort (tracked separately).
As a safe interim, fall back to a normal (non-concurrent) reindex on the
coordinator, with a NOTICE, in ReindexIndex()/ReindexTable() (covering REINDEX
TABLE/INDEX, including partitioned, which then reindexes each child
non-concurrently via ReindexPartitions) and ReindexMultipleInternal() (covering
REINDEX DATABASE/SCHEMA). Single-node utility mode is unaffected and keeps the
working concurrent path.
Verified: REINDEX (CONCURRENTLY) TABLE/INDEX and a partitioned table all succeed
with a NOTICE; the index OID is identical on the coordinator and every segment
(no divergence) and a forced index scan returns correct results; no cores, no
OID-dispatch error.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ordered child columns
A distribution-key UPDATE on a partitioned table whose leaf partitions have a
different physical column order than the parent crashed in the executor:
pg_detoast_datum <- hashtext/hash_numeric <- cdbhash <- ExecSplitUpdate
create_splitupdate_plan() derived the SplitUpdate's hash attnos and column types
from path->resultRelation's relcache entry. For a partitioned UPDATE that is a
leaf partition, whose physical column order can differ from the parent's
(e.g. update.sql's part_c_1_100 "e,d,c,b,a"). But the SplitUpdate's input tuples
are the subplan output, which is labeled with root->processed_tlist -- i.e. the
*nominal* (parent) column layout. So the distribution-key attnos taken from the
leaf policy indexed the wrong column of the parent-layout tuple: a key like "a"
at leaf attno 5 selected the parent's attno-5 column ("d", an int), which cdbhash
then fed to hashtext, dereferencing the small int as a varlena -> SIGSEGV. (The
mismatch also tripped the type Assert in the insertColIdx loop, which is compiled
out in non-cassert builds.)
Fix: take resultDesc and cdbpolicy from the nominal target relation
(root->parse->resultRelation), which is the layout the subplan tuples are in, so
insertColIdx, hashAttnos and hashFuncs all line up with the SplitUpdate input.
For a non-partitioned UPDATE this is the same relation as before.
Verified with update.sql's reordered-column partitions and assorted
text/numeric/multi-column distribution-key updates (including cross-partition row
movement): the hash now reads the correct key column and there is no crash; the
distribution key routes correctly and results are unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ld columns
Follow-on to the SplitUpdate cdbhash fix. A distribution-key UPDATE of a
partitioned table whose leaf partitions have a different physical column order
than the parent failed during the reinsert with:
ERROR: table row type and query-specified row type do not match
(ExecCheckPlanOutput, nodeModifyTable.c)
The SplitUpdate emits the new tuple in the root (nominal) target relation's
column layout (the subplan is labeled with root->processed_tlist), but the
DML_INSERT replay built the insert projection against the *source leaf*
partition's ResultRelInfo, whose descriptor can differ -- so ExecCheckPlanOutput
rejected the layout. (Before the cdbhash fix this path crashed earlier and was
never reached.)
Fix: build the split-update insert projection against the root result relation
(mtstate->rootResultRelInfo), so it matches the subplan output, and set up
partition tuple routing for split updates -- so ExecInsert() routes the new
tuple to the correct leaf, converting the layout via ri_RootToPartitionMap, and
enforces the partition constraint. For a non-partitioned target
rootResultRelInfo is the result relation itself, so this is a no-op there.
Verified: every split-update shape (heap/AO/AOCS, toasted, dropped-column,
multi-column and text/numeric distribution keys, UPDATE ... FROM, and
partitioned) inserts/redistributes correctly with no crash; the update
regression test no longer errors with a row-type mismatch (a partition-key
update that must stay within its subtree now correctly raises a partition
constraint violation -- naming the UPDATE target relation).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ResetTempNamespace() (reached from AbortTransaction -> ResetAllGangs during
primary gang-loss recovery) unconditionally cancels the temp-namespace
before_shmem_exit callback:
cancel_before_shmem_exit(RemoveTempRelationsCallback, 0);
Before PG14, cancel_before_shmem_exit() silently did nothing when the callback
was not the latest entry. Upstream commit c9ae5cb made it raise an error in
that case, and the PG14 merge adopted the strict version. But ResetTempNamespace
can legitimately reach this with the callback absent (temp namespace created but
not yet committed, so AtEOXact_Namespace never registered it) or no longer the
latest entry. Thrown from inside AbortTransaction(), that error escalates to a
coordinator PANIC -- turning any recoverable gang loss into an all-sessions-down
crash, which then cascades into hundreds of "protocol synchronization was lost"
and "transaction is aborted" failures across the suite.
Add a non-throwing cancel_before_shmem_exit_if_latest() (the pre-PG14 lenient
behavior: remove only if it is the latest entry, else return false) and use it
in ResetTempNamespace(). Leaving the callback registered is harmless --
RemoveTempRelationsCallback() no-ops once myTempNamespace is reset just below.
The strict cancel_before_shmem_exit() is kept for its correct-LIFO callers
(PG_ENSURE_ERROR_CLEANUP).
Verified: a full installcheck-good run that previously dumped a
cancel_before_shmem_exit coordinator-PANIC core now completes the same span with
zero cores.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…limit" (PG14)
A trivial COPY ... FROM stdin (or any COPY FROM) failed on every segment with
"invalid message length" -> "terminating connection because protocol
synchronization was lost", which the QD reported as "MPP detected N segment
failures, system is reconnected". Because COPY loads the data in most
regression tests, this poisoned a large swath of the suite (231
protocol-synchronization-lost FATALs in a full run). The same bug independently
broke nextval/sequences over MPP ("nextval: unable to parse nextval response
from QD", ~291 hits).
Root cause: GPDB multiplexes non-query messages over a libpq connection and
reads them with pq_getmessage(buf, 0), using a maxlen of 0 to mean "no upper
limit" -- COPY data forwarded QD->QE (copy.c) and the nextval-over-NOTIFY
response (sequence.c). GPDB's pq_getmessage encoded that as
if (len < 4 || (maxlen > 0 && len > maxlen))
The PG14 merge adopted upstream's length check verbatim
if (len < 4 || len > maxlen)
dropping the "maxlen > 0 &&" guard. With maxlen == 0 every message with len > 0
is rejected as "invalid message length", so the QE tears down the connection on
the first CopyData byte.
Diagnosis: wire-probes confirmed the QD frames CopyData correctly (type 'd',
len = 4 + nbytes, outBuffer_shared = 0) and the QE reads the correct len but
with maxlen = 0 -- so the comparison, not the data, was wrong.
Fix: restore the GPDB "maxlen > 0 &&" guard and document the maxlen == 0
convention so a future merge does not drop it again.
Verified on a live cluster: COPY FROM stdin, COPY FROM file, nextval(), and
SERIAL inserts (segment nextval) all succeed with correct row counts; no
"invalid message length" / protocol-synchronization-lost.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
COPY FROM into any partitioned (or sub-partitioned) table crashed every segment
with a NULL-pointer SIGSEGV in ExecInitPartitionInfo (reached via
CopyFrom -> ExecFindPartition), reported on the QD as "MPP detected N segment
failures, system is reconnected". This surfaced broadly once COPY data delivery
was fixed (see the pq_getmessage maxlen fix), because pg_regress loads much of
its data via COPY into partitioned tables.
Root cause: GPDB's COPY implementation lives in commands/copy.c (the upstream
commands/copyfrom.c is a stub here). copy.c builds its ResultRelInfo with
InitResultRelInfo() directly -- it never calls ExecInitResultRelation(), so
estate->es_result_relations stays NULL -- yet it initialized the
ModifyTableState with
mtstate->resultRelInfo = estate->es_result_relations; /* NULL */
and never set mt_nrels or rootResultRelInfo. The PG14 tuple-routing rework made
ExecInitPartitionInfo() dereference mtstate->resultRelInfo[0]
(.ri_RangeTableIndex / .ri_RelationDesc) and mtstate->rootResultRelInfo, so the
NULL resultRelInfo crashed at the first partition routed to. (copyfrom.c's
CopyFrom sets these three fields correctly; copy.c was not updated in the merge.)
Fix: set mt_nrels = 1 and point both resultRelInfo and rootResultRelInfo at the
COPY target's ResultRelInfo (a valid one-element array), mirroring copyfrom.c.
Verified: COPY FROM stdin into range-partitioned and range-sub-partitioned
tables inserts the correct row counts with no crash.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… "see"'
~46 internal SQL functions (col_description, obj_description, shobj_description,
and many overloaded date/time and system helpers) carry the placeholder body
prosrc => 'see system_functions.sql' in pg_proc.dat; their real bodies live in
src/backend/catalog/system_functions.sql, which initdb is supposed to run after
bootstrap. The PG merge dropped system_functions.sql from BOTH initdb.c and the
catalog Makefile's install list, so:
- the file was never installed to $(datadir), and
- initdb never executed it.
The placeholder bodies therefore survived, and every call to one of these
functions tried to execute the literal text "see system_functions.sql", failing
with: ERROR: syntax error at or near "see". Because col_description() is
invoked by psql's \d+, this poisoned a huge fraction of the regression suite
(439 "see" syntax errors plus the bulk of the downstream "current transaction is
aborted" / "relation does not exist" cascade).
Fix:
- initdb.c: declare system_functions_file, set_input/check_input it, add a
generic setup_run_file() helper, and run system_functions.sql right after
setup_auth() and before setup_depend() (so the functions are pinned),
matching upstream ordering.
- catalog/Makefile: install (and uninstall) system_functions.sql alongside
system_views.sql.
Verified: a fresh initdb succeeds; col_description()/obj_description() get real
bodies (PG14 stores them in prosqlbody) and psql \d+ shows column descriptions
with no syntax error.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CREATE VIEW (and any utility statement that dispatches a Query tree, e.g. CREATE TABLE AS) over a join failed on every segment with: ERROR: could not deserialize unrecognized node type: 3 (readfast.c) PG14 (commit 055fee7) added the JoinExpr.join_using_alias field. The merge added it to the text writer (_outJoinExpr in outfuncs.c) and the reader (_readJoinExpr in readfuncs.c, used for binary too), but NOT to the binary writer _outJoinExpr in outfast.c, which is the one used for QD->QE dispatch. So the wire layout written for a JoinExpr was one NODE field short of what the reader expected; the reader went off the rails and hit a garbage tag (3 = T_ProjectionInfo, a run-time node that is never serialized) -> deserialize error on the segment. Fix: write join_using_alias between usingClause and quals in outfast.c's _outJoinExpr, matching the reader and the text writer. Class of bug: a PG14-added node field re-grafted into the shared text/read funcs but missed in the separate binary writer (outfast.c). Found by probing readNodeBinary() to dump the QE-side tag stream, then diffing each outfast.c binary writer against its reader. Verified: CREATE VIEW over CROSS/INNER/NATURAL joins and selecting from the views all succeed, no segment crash. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…G14) PG14 added Result Cache (later renamed Memoize), gated by enable_resultcache (default on). GPDB has not integrated it with the MPP planner/executor: a generated T_ResultCache plan node is not handled by expression_tree_mutator() (nodeFuncs.c) and is also absent from the binary plan-dispatch serialization (outfast.c/readfast.c). So whenever the planner chose a Result Cache -- e.g. the information_schema.columns is_updatable computation under enable_mergejoin/enable_nestloop -- the query failed with: ERROR: unrecognized node type: 53 (53 = T_ResultCache) This hit a broad set of join-heavy tests (updatable_views, returning, rowtypes, join, indexjoin, subselect, partition_prune, gin, ...). Until Result Cache is properly supported in MPP, default enable_resultcache to off so the planner never generates the node. GPDB's expected outputs never show a Result Cache node, so this matches them; the standalone resultcache regression test is not in any schedule, and aggregates.sql already sets it off explicitly for its one relevant query. Verified: with the new default, the minimal repro (CREATE VIEW + information_ schema is_updatable under merge/nestloop) and the updatable_views / returning / rowtypes regression tests no longer raise "unrecognized node type: 53". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…heir C helpers (PG14) create_function_0 (input/create_function_0.source) defines the C helper functions binary_coercible(oid,oid), test_enc_conversion(...) and test_opclass_options_func(internal) from regress.so, which type_sanity, opr_sanity and conversion depend on. The merge left create_function_0 out of parallel_schedule entirely (only create_function_1/2/3 are listed), so those helpers were never created and the dependent tests failed with "function binary_coercible(oid, oid) does not exist" / "function test_enc_conversion(...) does not exist" (~40 hits, plus the downstream cascade in opr_sanity/type_sanity). Add create_function_0 to the schedule right after test_setup (before any test that uses the helpers). It has its own expected output (create_function_0.out). Verified (focused schedule test_setup -> create_function_0 -> opr_sanity conversion): create_function_0 and conversion now pass and the "function ... does not exist" errors are gone (opr_sanity still differs for unrelated reasons). NB create_function_0 also creates trigger functions from contrib/spi's refint.so/autoinc.so, which must be installed (make -C contrib/spi install) -- a build/install step, not part of this source change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A correlated subquery in the SELECT targetlist over a distributed table ran without its correlation filter, returning wrong results: SELECT a, (SELECT count(*) FROM t2 WHERE t2.b = t1.a) FROM t1; returned the *total* count for every row instead of the per-row correlated count, and a non-aggregate correlated subquery raised "more than one row returned by a subquery used as an expression". GPDB routes a correlated subquery's param filter (e.g. t2.b = $0) to the rel's upperrestrictinfo; bring_to_outer_query() then applies it via a Result node above the Broadcast Motion, so each segment filters the broadcast set by its local param. That Result is built by create_projection_path_with_quals() with the filter in cdb_restrict_clauses. Two code paths silently dropped the filter: 1. When the subpath is projection-capable (a Motion is), the function took the no-Result shortcut (dummypp = true) and never stored cdb_restrict_clauses, discarding the filter. Don't take the shortcut when there are restrict clauses to apply. 2. When a later projection (the scan/join target) was layered over the filter-carrying ProjectionPath, the path-collapsing code stripped the inner ProjectionPath and discarded its cdb_restrict_clauses. Carry them up into the surviving ProjectionPath instead. With both fixed, create_projection_plan() emits a Result with plan->qual = the param filter, and correlated targetlist subqueries return correct results. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…G14) A correlated EXISTS in the SELECT targetlist over a distributed table failed with: ERROR: subplan is missing Flow information SELECT a, EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a) FROM t1; For a simple EXISTS, make_subplan() additionally builds a hashed ANY variant and wraps both in an AlternativeSubPlan, leaving setrefs.c to choose. GPDB's MPP slice machinery does not support AlternativeSubPlan: the hashed plan is created without Flow/slice information (so cdbllize raises "subplan is missing Flow information"), and cdbllize cannot reason about an AlternativeSubPlan when pruning unused subplans. In dispatch (MPP) mode, skip building the hashed alternative and keep just the correlated SubPlan we already built. It is correct on its own -- its correlation filter is applied above the Motion (see the companion fix in create_projection_path_with_quals) -- so EXISTS in the targetlist now returns correct results. The hashed alternative is still considered for non-MPP (utility-mode) planning. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Here are some reminders before you submit the pull request
make installcheck