Tom Lane [Wed, 23 Sep 2020 15:36:13 +0000 (11:36 -0400)]
Avoid possible dangling-pointer access in tsearch_readline_callback.
tsearch_readline() saves the string pointer it returns to the caller
for possible use in the associated error context callback. However,
the caller will usually pfree that string sometime before it next
calls tsearch_readline(), so that there is a window where an ereport
will try to print an already-freed string.
The built-in users of tsearch_readline() happen to all do that pfree
at the bottoms of their loops, so that the window is effectively
empty for them. However, this is not documented as a requirement,
and contrib/dict_xsyn doesn't do it like that, so it seems likely
that third-party dictionaries might have live bugs here.
The practical consequences of this seem pretty limited in any case,
since production builds wouldn't clobber the freed string immediately,
besides which you'd not expect syntax errors in dictionary files
being used in production. Still, it's clearly a bug waiting to bite
somebody.
Fix by pstrdup'ing the string to be saved for the error callback,
and then pfree'ing it next time through. It's been like this for
a long time, so back-patch to all supported branches.
Tom Lane [Fri, 18 Sep 2020 22:03:44 +0000 (18:03 -0400)]
Use factorial rather than numeric_fac in create_operator.sql.
These two SQL functions are aliases for the same C function, so this
change has no semantic effect. However, because we dropped the
numeric_fac alias in HEAD (commit 76f412ab3), operator definitions
based on that one don't port forward, causing problems for cross-version
upgrade tests based on the regression database.
Patch all active back branches to dodge the problem.
Amit Kapila [Fri, 18 Sep 2020 04:10:04 +0000 (09:40 +0530)]
Fix comments in heapam.c.
After commits 85f6b49c2c and 3ba59ccc89, we can allow parallel inserts
which was earlier not possible as parallel group members won't conflict
for relation extension and page lock. In those commits, we forgot to
update comments at few places.
Author: Amit Kapila Reviewed-by: Robert Haas and Dilip Kumar
Backpatch-through: 13
Discussion: https://postgr.es/m/CAFiTN-tMrQh5FFMPx5aWJ+1gi1H6JxktEhq5mDwCHgnEO5oBkA@mail.gmail.com
Amit Kapila [Thu, 17 Sep 2020 09:46:46 +0000 (15:16 +0530)]
Update parallel BTree scan state when the scan keys can't be satisfied.
For parallel btree scan to work for array of scan keys, it should reach
BTPARALLEL_DONE state once for every distinct combination of array keys.
This is required to ensure that the parallel workers don't try to seize
blocks at the same time for different scan keys. We missed to update this
state when we discovered that the scan keys can't be satisfied.
Author: James Hunter Reviewed-by: Amit Kapila Tested-by: Justin Pryzby
Backpatch-through: 10, where it was introduced
Discussion: https://postgr.es/m/4248CABC-25E3-4809-B4D0-128E1BAABC3C@amazon.com
Since commit fd5942c18f97 (2012, 9.3-era), walsender has been sending
completion tags for certain replication commands twice -- and they're
not even consistent. Apparently neither libpq nor JDBC have a problem
with it, but it's not kosher. Fix by remove the EndCommand() call in
the common code path for them all, and inserting specific calls to
EndReplicationCommand() specifically in those places where it's needed.
EndReplicationCommand() is a new simple function to send the completion
tag for replication commands. Do this instead of sending a generic
SELECT completion tag for them all, which was also pretty bogus (if
innocuous). While at it, change StartReplication() to use
EndReplicationCommand() instead of pg_puttextmessage().
In commit 2f9661311b83, I failed to realize that replication commands
are not close-enough kin of regular SQL commands, so the
DROP_REPLICATION_SLOT tag I added is undeserved and a type pun. Take it
out.
Backpatch to 13, where the latter commit appeared. The duplicate tag
has been sent since 9.3, but since nothing is broken, it doesn't seem
worth fixing.
Peter Geoghegan [Wed, 16 Sep 2020 17:42:28 +0000 (10:42 -0700)]
Fix amcheck child check pg_upgrade bug.
Commit d114cc53 overlooked the fact that pg_upgrade'd B-Tree indexes
have leaf page high keys whose offset numbers do not match the one from
the copy of the tuple one level up (the copy stored with a downlink for
leaf page's right sibling page). This led to false positive reports of
corruption from bt_index_parent_check() when it was called to verify a
pg_upgrade'd index.
To fix, skip comparing the offset number on pg_upgrade'd B-Tree indexes.
Author: Anastasia Lubennikova
Author: Peter Geoghegan Reported-By: Andrew Bille Diagnosed-By: Anastasia Lubennikova
Bug: #16619
Discussion: https://postgr.es/m/16619-aaba10f83fdc1c3c@postgresql.org
Backpatch: 13-, where child check was enhanced.
Tom Lane [Wed, 16 Sep 2020 17:38:26 +0000 (13:38 -0400)]
Avoid unnecessary recursion to child tables in ALTER TABLE SET NOT NULL.
If a partitioned table's column is already marked NOT NULL, there is
no need to examine its partitions, because we can rely on previous
DDL to have enforced that the child columns are NOT NULL as well.
(Unfortunately, the same cannot be said for traditional inheritance,
so for now we have to restrict the optimization to partitioned tables.)
Hence, we may skip recursing to child tables in this situation.
The reason this case is worth worrying about is that when pg_dump dumps
a partitioned table having a primary key, it will include the requisite
NOT NULL markings in the CREATE TABLE commands, and then add the
primary key as a separate step. The primary key addition generates a
SET NOT NULL as a subcommand, just to be sure. So the situation where
a SET NOT NULL is redundant does arise in the real world.
Skipping the recursion does more than just save a few cycles: it means
that a command such as "ALTER TABLE ONLY partition_parent ADD PRIMARY
KEY" will take locks only on the partition parent table, not on the
partitions. It turns out that parallel pg_restore is effectively
assuming that that's true, and has little choice but to do so because
the dependencies listed for such a TOC entry don't include the
partitions. pg_restore could thus issue this ALTER while data restores
on the partitions are still in progress. Taking unnecessary locks on
the partitions not only hurts concurrency, but can lead to actual
deadlock failures, as reported by Domagoj Smoljanovic.
(A contributing factor in the deadlock is that TRUNCATE on a child
partition wants a non-exclusive lock on the parent. This seems
likewise unnecessary, but the fix for it is more invasive so we
won't consider back-patching it. Fortunately, getting rid of one
of these two poor behaviors is enough to remove the deadlock.)
Although support for partitioned primary keys came in with v11,
this patch is dependent on the SET NOT NULL refactoring done by
commit f4a3fdfbd, so we can only patch back to v12.
Patch by me; thanks to Alvaro Herrera and Amit Langote for review.
Tom Lane [Wed, 16 Sep 2020 16:07:31 +0000 (12:07 -0400)]
Fix bogus cache-invalidation logic in logical replication worker.
The code recorded cache invalidation events by zeroing the "localreloid"
field of affected cache entries. However, it's possible for an inval
event to occur even while we have the entry open and locked. So an
ill-timed inval could result in "cache lookup failed for relation 0"
errors, if the worker's code tried to use the cleared field. We can
fix that by creating a separate bool field to record whether the entry
needs to be revalidated. (In the back branches, cram the bool into
what had been padding space, to avoid an ABI break in the somewhat
unlikely event that any extension is looking at this struct.)
Also, rearrange the logic in logicalrep_rel_open so that it
does the right thing in cases where table_open would fail.
We should retry the lookup by name in that case, but we didn't.
The real-world impact of this is probably small. In the first place,
the error conditions are very low probability, and in the second place,
the worker would just exit and get restarted. We only noticed because
in a CLOBBER_CACHE_ALWAYS build, the failure can occur repeatedly,
preventing the worker from making progress. Nonetheless, it's clearly
a bug, and it impedes a useful type of testing; so back-patch to v10
where this code was introduced.
Jeff Davis [Wed, 16 Sep 2020 04:34:05 +0000 (21:34 -0700)]
Change LogicalTapeSetBlocks() to use nBlocksWritten.
Previously, it was based on nBlocksAllocated to account for tapes with
open write buffers that may not have made it to the BufFile yet.
That was unnecessary, because callers do not need to get the number of
blocks while a tape has an open write buffer; and it also conflicted
with the preallocation logic added for HashAgg.
Fix use-after-free bug with event triggers in an extension script
ALTER TABLE commands in an extension script are added to an event
trigger command list; but starting with commit b5810de3f4 they do so in
a memory context that's too short-lived, so when execution ends and time
comes to use the entries, they've already been freed.
(This would also be a problem with ALTER TABLE commands in a
multi-command query string, but these serendipitously end in
PortalContext -- which probably explains why it took so long for this to
be reported.)
Fix by using the memory context specifically set for that, instead.
Backpatch to 13, where the aforementioned commit appeared.
Reported-by: Philippe Beaudoin
Author: Jehan-Guillaume de Rorthais
Discussion: https://postgr.es/m/20200902193715.6e0269d4@firost
Tom Lane [Sun, 13 Sep 2020 16:51:21 +0000 (12:51 -0400)]
Use the properly transformed RangeVar for expandTableLikeClause().
transformCreateStmt() adjusts the transformed statement's RangeVar
to specify the target schema explicitly, for the express reason
of making sure that auxiliary statements derived by parse
transformation operate on the right table. But the refactoring
I did in commit 502898192 got this wrong and passed the untransformed
RangeVar to expandTableLikeClause(). This could lead to assertion
failures or weird misbehavior if the wrong table was accessed.
Per report from Alexander Lakhin. Like the previous patch, back-patch
to all supported branches.
Tom Lane [Thu, 10 Sep 2020 16:06:26 +0000 (12:06 -0400)]
Use _exit(2) for SIGQUIT during ProcessStartupPacket, too.
Bring the signal handling for startup-packet collection into line
with the policy established in commits bedadc732 and 8e19a8264,
namely don't risk running atexit callbacks when handling SIGQUIT.
Ideally, we'd not do so for SIGTERM or timeout interrupts either,
but that change seems a bit too risky for the back branches.
For now, just improve the comments in this area to describe the risk.
Also relocate where BackendInitialize re-disables these interrupts,
to minimize the code span where they're active. This doesn't buy
a whole lot of safety, but it can't hurt.
In passing, rename startup_die() to remove confusion about whether
it is for the startup process.
Like the previous commits, back-patch to all supported branches.
Peter Eisentraut [Thu, 10 Sep 2020 13:31:09 +0000 (15:31 +0200)]
doc: Remove buggy ICU collation from documentation
We have had multiple reports that point to the
'@colReorder=latn-digit' collation customization being buggy. We have
reported this to ICU and are waiting for a fix. In the meantime,
remove references to this from the documentation and replace it by
another reordering example. Apparently, many users have been picking
up this example specifically from the documentation.
Author: Jehan-Guillaume de Rorthais
Discussion: https://www.postgresql.org/message-id/flat/153201618542.1404.3611626898935613264%40wrigleys.postgresql.org
Fix rd_firstRelfilenodeSubid for nailed relations, in parallel workers.
Move applicable code out of RelationBuildDesc(), which nailed relations
bypass. Non-assert builds experienced no known problems. Back-patch to
v13, where commit c6b92041d38512a4176ed76ad06f713d2e6c01a8 introduced
rd_firstRelfilenodeSubid.
Tom Lane [Wed, 9 Sep 2020 19:32:34 +0000 (15:32 -0400)]
Make archiver's SIGQUIT handler exit via _exit().
Commit 8e19a8264 changed the SIGQUIT handlers of almost all server
processes not to run atexit callbacks. The archiver process was
skipped, perhaps because it's not connected to shared memory; but
it's just as true here that running atexit callbacks in a signal
handler is unsafe. So let's make it work like the rest.
In HEAD and v13, we can use the common SignalHandlerForCrashExit
handler. Before that, just tweak pgarch_exit to use _exit(2)
explicitly.
Like the previous commit, back-patch to all supported branches.
Tom Lane [Wed, 9 Sep 2020 16:00:49 +0000 (12:00 -0400)]
Doc: adjust documentation related to index support functions.
Commit 15cb2bd27 neglected to make the running text match the
tables, leaving the reader with the strong impression that
we cannot count. Also, don't drop an unrelated para between
a table and the para describing it.
Check default partitions constraints while descending
Partitioning tuple route code assumes that the partition chosen while
descending the partition hierarchy is always the correct one. This is
true except when the partition is the default partition and another
partition has been added concurrently: the partition constraint changes
and we don't recheck it. This can lead to tuples mistakenly being added
to the default partition that should have been rejected.
Fix by rechecking the default partition constraint while descending the
hierarchy.
An isolation test based on the reproduction steps described by Hao Wu
(with tweaks for extra coverage) is included.
Backpatch to 12, where this bug came in with 898e5e3290a7.
Michael Paquier [Mon, 7 Sep 2020 05:35:13 +0000 (14:35 +0900)]
doc: Tweak sentence for pg_checksums when enabling checksums
The previous version of the docs mentioned that files are rewritten,
implying that a second copy of each file gets created, but each file is
updated in-place.
Tom Lane [Sun, 6 Sep 2020 15:50:40 +0000 (11:50 -0400)]
Remove useless lstat() call in pg_rewind.
This is duplicative of an lstat that was just done by the calling
function (traverse_datadir), besides which we weren't really doing
anything with the results. There's not much point in checking to
see if someone removed the file since the previous lstat, since the
FILE_ACTION_REMOVE code would have to deal with missing-file cases
anyway. Moreover, the "exists = false" assignment was a dead store;
nothing was done with that value later.
A syscall saved is a syscall earned, so back-patch to 9.5
where this code was introduced.
Tom Lane [Sat, 5 Sep 2020 01:01:59 +0000 (21:01 -0400)]
Make new authentication test case more robust.
I happened to notice that the new test case I added in b55b4dad9
falls over if one runs "make check" repeatedly; though not in branches
after v10. That's because it was assuming that tmp_check/pgpass
wouldn't exist already. However, it's only been since v11 that the
Makefiles forcibly remove all of tmp_check/ before starting a TAP run.
This fix to unlink the file is therefore strictly necessary only in
v10 ... but it seems wisest to do it across the board, rather than
let the test rely on external logic to get the conditions right.
Tom Lane [Sat, 5 Sep 2020 00:20:05 +0000 (20:20 -0400)]
Fix over-eager ping'ing in logical replication receiver.
Commit 3f60f690f only partially fixed the broken-status-tracking
issue in LogicalRepApplyLoop: we need ping_sent to have the same
lifetime as last_recv_timestamp. The effects are much less serious
than what that commit fixed, though. AFAICS this would just lead to
extra ping requests being sent, once per second until the sender
responds. Still, it's a bug, so backpatch to v10 as before.
Tom Lane [Fri, 4 Sep 2020 16:40:28 +0000 (12:40 -0400)]
Fix rare deadlock failure in create_am regression test.
The "DROP ACCESS METHOD gist2" test will require locking the index
to be dropped and then its table; while most ordinary operations
lock a table first then its index. While no concurrent test scripts
should be touching fast_emp4000, autovacuum might chance to be
processing that table when the DROP runs, resulting in a deadlock
failure. This is pretty rare but we see it in the buildfarm from
time to time.
To fix, acquire a lock on fast_emp4000 before issuing the DROP.
Since the point of the exercise is mostly to prevent buildfarm
failures, back-patch to 9.6 where this test was introduced.
Tom Lane [Thu, 3 Sep 2020 20:52:09 +0000 (16:52 -0400)]
Avoid lockup of a parallel worker when reporting a long error message.
Because sigsetjmp() will restore the initial state with signals blocked,
the code path in bgworker.c for reporting an error and exiting would
execute that way. Usually this is fairly harmless; but if a parallel
worker had an error message exceeding the shared-memory communication
buffer size (16K) it would lock up, because it would wait for a
resume-sending signal from its parallel leader which it would never
detect.
To fix, just unblock signals at the appropriate point.
This can be shown to fail back to 9.6. The lack of parallel query
infrastructure makes it difficult to provide a simple test case for
9.5; but I'm pretty sure the issue exists in some form there as well,
so apply the code change there too.
Vignesh C, reviewed by Bharath Rupireddy, Robert Haas, and myself
Tom Lane [Thu, 3 Sep 2020 15:45:26 +0000 (11:45 -0400)]
Doc: mention packager-supplied tools for server start/stop, initdb, etc.
The majority of our audience is probably using a pre-packaged Postgres
build rather than raw sources. For them, much of runtime.sgml is not
too relevant, and they should be reading the packager's docs instead.
Add some notes pointing that way in appropriate places.
Text by me; thanks to Daniel Gustafsson for review and discussion,
and to Laurenz Albe for an earlier version.
Raise error on concurrent drop of partitioned index
We were already raising an error for DROP INDEX CONCURRENTLY on a
partitioned table, albeit a different and confusing one:
ERROR: DROP INDEX CONCURRENTLY must be first action in transaction
Change that to throw a more comprehensible error:
ERROR: cannot drop partitioned index \"%s\" concurrently
Michael Paquier authored the test case for indexes on temporary
partitioned tables.
Backpatch to 11, where indexes on partitioned tables were added.
Reported-by: Jan Mussler Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/16594-d2956ca909585067@postgresql.org
Tom Lane [Tue, 1 Sep 2020 17:14:44 +0000 (13:14 -0400)]
Teach libpq to handle arbitrary-length lines in .pgpass files.
Historically there's been a hard-wired assumption here that no line of
a .pgpass file could be as long as NAMEDATALEN*5 bytes. That's a bit
shaky to start off with, because (a) there's no reason to suppose that
host names fit in NAMEDATALEN, and (b) this figure fails to allow for
backslash escape characters. However, it fails completely if someone
wants to use a very long password, and we're now hearing reports of
people wanting to use "security tokens" that can run up to several
hundred bytes. Another angle is that the file is specified to allow
comment lines, but there's no reason to assume that long comment lines
aren't possible.
Rather than guessing at what might be a more suitable limit, let's
replace the fixed-size buffer with an expansible PQExpBuffer. That
adds one malloc/free cycle to the typical use-case, but that's surely
pretty cheap relative to the I/O this code has to do.
Also, add TAP test cases to exercise this code, because there was no
test coverage before.
This reverts most of commit 2eb3bc588, as there's no longer a need for
a warning message about overlength .pgpass lines. (I kept the explicit
check for comment lines, though.)
In HEAD and v13, this also fixes an oversight in 74a308cf5: there's not
much point in explicit_bzero'ing the line buffer if we only do so in two
of the three exit paths.
Back-patch to all supported branches, except that the test case only
goes back to v10 where src/test/authentication/ was added.
Bruce Momjian [Mon, 31 Aug 2020 22:33:37 +0000 (18:33 -0400)]
doc: add commas after 'i.e.' and 'e.g.'
This follows the American format,
https://jakubmarian.com/comma-after-i-e-and-e-g/. There is no intention
of requiring this format for future text, but making existing text
consistent every few years makes sense.
Tom Lane [Sun, 30 Aug 2020 18:37:24 +0000 (14:37 -0400)]
Mark factorial operator, and postfix operators in general, as deprecated.
Per discussion, we're planning to remove parser support for postfix
operators in order to simplify the grammar. So it behooves us to
put out a deprecation notice at least one release before that.
There is only one built-in postfix operator, ! for factorial.
Label it deprecated in the docs and in pg_description, and adjust
some examples that formerly relied on it. (The sister prefix
operator !! is also deprecated. We don't really have to remove
that one, but since we're suggesting that people use factorial()
instead, it seems better to remove both operators.)
Also state in the CREATE OPERATOR ref page that postfix operators
in general are going away.
Although this changes the initial contents of pg_description,
I did not force a catversion bump; it doesn't seem essential.
In v13, also back-patch 4c5cf5431, so that there's someplace for
the s to point to.
Mark Dilger and John Naylor, with some adjustments by me
Tom Lane [Thu, 27 Aug 2020 21:36:13 +0000 (17:36 -0400)]
Fix code for re-finding scan position in a multicolumn GIN index.
collectMatchBitmap() needs to re-find the index tuple it was previously
looking at, after transiently dropping lock on the index page it's on.
The tuple should still exist and be at its prior position or somewhere
to the right of that, since ginvacuum never removes tuples but
concurrent insertions could add one. However, there was a thinko in
that logic, to the effect of expecting any inserted tuples to have the
same index "attnum" as what we'd been scanning. Since there's no
physical separation of tuples with different attnums, it's not terribly
hard to devise scenarios where this fails, leading to transient "lost
saved point in index" errors. (While I've duplicated this with manual
testing, it seems impossible to make a reproducible test case with our
available testing technology.)
Fix by just continuing the scan when the attnum doesn't match.
While here, improve the error message used if we do fail, so that it
matches the wording used in btree for a similar case.
collectMatchBitmap()'s posting-tree code path was previously not
exercised at all by our regression tests. While I can't make
a regression test that exhibits the bug, I can at least improve
the code coverage here, so do that. The test case I made for this
is an extension of one added by 4b754d6c1, so it only works in
HEAD and v13; didn't seem worth trying hard to back-patch it.
Per bug #16595 from Jesse Kinkead. This has been broken since
multicolumn capability was added to GIN (commit 27cb66fdf),
so back-patch to all supported branches.
Amit Kapila [Mon, 24 Aug 2020 02:52:54 +0000 (08:22 +0530)]
Improve the vacuum error context phase information.
We were displaying the wrong phase information for 'info' message in the
index clean up phase because we were switching to the previous phase a bit
early. We were also not displaying context information for heap phase
unless the block number is valid which is fine for error cases but for
messages at 'info' or lower error level it appears to be inconsistent with
index phase information.
Reported-by: Sawada Masahiko
Author: Sawada Masahiko Reviewed-by: Amit Kapila
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/CA+fd4k4HcbhPnCs7paRTw1K-AHin8y4xKomB9Ru0ATw0UeTy2w@mail.gmail.com
Tom Lane [Sat, 22 Aug 2020 18:46:40 +0000 (14:46 -0400)]
Avoid pushing quals down into sub-queries that have grouping sets.
The trouble with doing this is that an apparently-constant subquery
output column isn't really constant if it is a grouping column that
appears in only some of the grouping sets. A qual using such a
column would be subject to incorrect const-folding after push-down,
as seen in bug #16585 from Paul Sivash.
To fix, just disable qual pushdown altogether if the sub-query has
nonempty groupingSets. While we could imagine far less restrictive
solutions, there is not much point in working harder right now,
because subquery_planner() won't move HAVING clauses to WHERE within
such a subquery. If the qual stays in HAVING it's not going to be
a lot more useful than if we'd kept it at the outer level.
Having said that, this restriction could be removed if we used a
parsetree representation that distinguished such outputs from actual
constants, which is something I hope to do in future. Hence, make
the patch a minimal addition rather than integrating it more tightly
(e.g. by renumbering the existing items in subquery_is_pushdown_safe's
comment).
Back-patch to 9.5 where grouping sets were introduced.
Tom Lane [Sat, 22 Aug 2020 16:34:17 +0000 (12:34 -0400)]
Fix ALTER TABLE's scheduling rules for AT_AddConstraint subcommands.
Commit 1281a5c90 rearranged the logic in this area rather drastically,
and it broke the case of adding a foreign key constraint in the same
ALTER that adds the pkey or unique constraint it depends on. While
self-referential fkeys are surely a pretty niche case, this used to
work so we shouldn't break it.
To fix, reorganize the scheduling rules in ATParseTransformCmd so
that a transformed AT_AddConstraint subcommand will be delayed into
a later pass in all cases, not only when it's been spit out as a
side-effect of parsing some other command type.
Also tweak the logic so that we won't run ATParseTransformCmd twice
while doing this. It seems to work even without that, but it's
surely wasting cycles to do so.
Per bug #16589 from Jeremy Evans. Back-patch to v13 where the new
code was introduced.
Tom Lane [Fri, 21 Aug 2020 19:00:42 +0000 (15:00 -0400)]
Fix handling of CREATE TABLE LIKE with inheritance.
If a CREATE TABLE command uses both LIKE and traditional inheritance,
Vars in CHECK constraints and expression indexes that are absorbed
from a LIKE parent table tended to get mis-numbered, resulting in
wrong answers and/or bizarre error messages (though probably not any
actual crashes, thanks to validation occurring in the executor).
In v12 and up, the same could happen to Vars in GENERATED expressions,
even in cases with no LIKE clause but multiple traditional-inheritance
parents.
The cause of the problem for LIKE is that parse_utilcmd.c supposed
it could renumber such Vars correctly during transformCreateStmt(),
which it cannot since we have not yet accounted for columns added via
inheritance. Fix that by postponing processing of LIKE INCLUDING
CONSTRAINTS, DEFAULTS, GENERATED, INDEXES till after we've performed
DefineRelation().
The error with GENERATED and multiple inheritance is a simple oversight
in MergeAttributes(); it knows it has to renumber Vars in inherited
CHECK constraints, but forgot to apply the same processing to inherited
GENERATED expressions (a/k/a defaults).
Per bug #16272 from Tom Gottfried. The non-GENERATED variants of the
issue are ancient, presumably dating right back to the addition of
CREATE TABLE LIKE; hence back-patch to all supported branches.
Fujii Masao [Fri, 21 Aug 2020 16:22:55 +0000 (01:22 +0900)]
Fix explain regression test failure.
Commit 9d701e624f caused the regression test for EXPLAIN to fail on
the buildfarm member prion. This happened because of instability of
test output, i.e., in text format, whether "Planning:" line is output
varies depending on the system state.
This commit updated the regression test so that it ignores that
"Planning:" line to produce more stable test output and get rid of
the test failure.
Fujii Masao [Fri, 21 Aug 2020 11:48:59 +0000 (20:48 +0900)]
Rework EXPLAIN for planner's buffer usage.
Commit ce77abe63c allowed EXPLAIN (BUFFERS) to report the information
on buffer usage during planning phase. However three issues were
reported regarding this feature.
(1) Previously, EXPLAIN option BUFFERS required ANALYZE. So the query
had to be actually executed by specifying ANALYZE even when we
want to see only the planner's buffer usage. This was inconvenient
especially when the query was write one like DELETE.
(2) EXPLAIN included the planner's buffer usage in summary
information. So SUMMARY option had to be enabled to report that.
Also this format was confusing.
(3) The output structure for planning information was not consistent
between TEXT format and the others. For example, "Planning" tag
was output in JSON format, but not in TEXT format.
For (1), this commit allows us to perform EXPLAIN (BUFFERS) without
ANALYZE to report the planner's buffer usage.
For (2), this commit changed EXPLAIN output so that the planner's
buffer usage is reported before summary information.
For (3), this commit made the output structure for planning
information more consistent between the formats.
Back-patch to v13 where the planner's buffer usage was allowed to
be reported in EXPLAIN.
Reported-by: Pierre Giraud, David Rowley
Author: Fujii Masao Reviewed-by: David Rowley, Julien Rouhaud, Pierre Giraud
Discussion: https://postgr.es/m/07b226e6-fa49-687f-b110-b7c37572f69e@dalibo.com
David Rowley [Thu, 20 Aug 2020 21:34:47 +0000 (09:34 +1200)]
Fix a few typos in JIT comments and README
Reviewed-by: Abhijit Menon-Sen Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CAApHDvobgmCs6CohqhKTUf7D8vffoZXQTCBTERo9gbOeZmvLTw%40mail.gmail.com
Backpatch-through: 11, where JIT was added
Tom Lane [Wed, 19 Aug 2020 18:07:49 +0000 (14:07 -0400)]
Suppress unnecessary RelabelType nodes in yet more cases.
Commit a477bfc1d fixed eval_const_expressions() to ensure that it
didn't generate unnecessary RelabelType nodes, but I failed to notice
that some other places in the planner had the same issue. Really
noplace in the planner should be using plain makeRelabelType(), for
fear of generating expressions that should be equal() to semantically
equivalent trees, but aren't.
An example is that because canonicalize_ec_expression() failed
to be careful about this, we could end up with an equivalence class
containing both a plain Const, and a Const-with-RelabelType
representing exactly the same value. So far as I can tell this led to
no visible misbehavior, but we did waste a bunch of cycles generating
and evaluating "Const = Const-with-RelabelType" to prove such entries
are redundant.
Hence, move the support function added by a477bfc1d to where it can
be more generally useful, and use it in the places where planner code
previously used makeRelabelType.
Back-patch to v12, like the previous patch. While I have no concrete
evidence of any real misbehavior here, it's certainly possible that
I overlooked a case where equivalent expressions that aren't equal()
could cause a user-visible problem. In any case carrying extra
RelabelType nodes through planning to execution isn't very desirable.
Fix printing last progress report line in client programs.
A number of client programs have a "--progress" option that when printing
to a TTY, updates the current line by printing a '\r' and overwriting it.
After the last line, '\n' needs to be printed to move the cursor to the
next line. pg_basebackup and pgbench got this right, but pg_rewind and
pg_checksums were slightly wrong. pg_rewind printed the newline to stdout
instead of stderr, and pg_checksums printed the newline even when not
printing to a TTY. Fix them, and also add a 'finished' argument to
pg_basebackup's progress_report() function, to keep it consistent with
the other programs.
Backpatch to v12. pg_rewind's newline was broken with the logging changes
in commit cc8d415117 in v12, and pg_checksums was introduced in v12.
Tom Lane [Sat, 15 Aug 2020 19:43:34 +0000 (15:43 -0400)]
Doc: various improvements for pg_basebackup reference page.
Put the -r option in the right section (it certainly isn't an
option controlling "the location and format of the output").
Clarify the behavior of the tablespace and waldir options
(that part per gripe from [email protected]).
Make a large number of small copy-editing fixes in text that
visibly wasn't written by native speakers, and try to avoid
grammatical inconsistencies between the descriptions of
the different options.
Back-patch to v13, since HEAD hasn't meaningfully diverged yet.
Noah Misch [Sat, 15 Aug 2020 17:15:53 +0000 (10:15 -0700)]
Prevent concurrent SimpleLruTruncate() for any given SLRU.
The SimpleLruTruncate() header comment states the new coding rule. To
achieve this, add locktype "frozenid" and two LWLocks. This closes a
rare opportunity for data loss, which manifested as "apparent
wraparound" or "could not access status of transaction" errors. Data
loss is more likely in pg_multixact, due to released branches' thin
margin between multiStopLimit and multiWrapLimit. If a user's physical
replication primary logged ": apparent wraparound" messages, the user
should rebuild standbys of that primary regardless of symptoms. At less
risk is a cluster having emitted "not accepting commands" errors or
"must be vacuumed" warnings at some point. One can test a cluster for
this data loss by running VACUUM FREEZE in every database. Back-patch
to 9.5 (all supported versions).
Tom Lane [Sat, 15 Aug 2020 02:14:03 +0000 (22:14 -0400)]
Be more careful about the shape of hashable subplan clauses.
nodeSubplan.c expects that the testexpr for a hashable ANY SubPlan
has the form of one or more OpExprs whose LHS is an expression of the
outer query's, while the RHS is an expression over Params representing
output columns of the subquery. However, the planner only went as far
as verifying that the clauses were all binary OpExprs. This works
99.99% of the time, because the clauses have the right shape when
emitted by the parser --- but it's possible for function inlining to
break that, as reported by PegoraroF10. To fix, teach the planner
to check that the LHS and RHS contain the right things, or more
accurately don't contain the wrong things. Given that this has been
broken for years without anyone noticing, it seems sufficient to just
give up hashing when it happens, rather than go to the trouble of
commuting the clauses back again (which wouldn't necessarily work
anyway).
While poking at that, I also noticed that nodeSubplan.c had a baked-in
assumption that the number of hash clauses is identical to the number
of subquery output columns. Again, that's fine as far as parser output
goes, but it's not hard to break it via function inlining. There seems
little reason for that assumption though --- AFAICS, the only thing
it's buying us is not having to store the number of hash clauses
explicitly. Adding code to the planner to reject such cases would take
more code than getting nodeSubplan.c to cope, so I fixed it that way.
This has been broken for as long as we've had hashable SubPlans,
so back-patch to all supported branches.
Alvaro Herrera [Fri, 14 Aug 2020 21:33:31 +0000 (17:33 -0400)]
pg_dump: fix dependencies on FKs to partitioned tables
Parallel-restoring a foreign key that references a partitioned table
with several levels of partitions can fail:
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 6684; 2606 29166 FK CONSTRAINT fk fk_a_fkey postgres
pg_restore: error: could not execute query: ERROR: there is no unique constraint matching given keys for referenced table "pk"
Command was: ALTER TABLE fkpart3.fk
ADD CONSTRAINT fk_a_fkey FOREIGN KEY (a) REFERENCES fkpart3.pk(a);
This happens in parallel restore mode because some index partitions
aren't yet attached to the topmost partitioned index that the FK uses,
and so the index is still invalid. The current code marks the FK as
dependent on the first level of index-attach dump objects; the bug is
fixed by recursively marking the FK on their children.
Backpatch to 12, where FKs to partitioned tables were introduced.
Reported-by: Tom Lane
Author: Álvaro Herrera
Discussion: https://postgr.es/m/3170626.1594842723@sss.pgh.pa.us
Backpatch: 12-master
Tom Lane [Fri, 14 Aug 2020 17:26:57 +0000 (13:26 -0400)]
Fix postmaster's behavior during smart shutdown.
Up to now, upon receipt of a SIGTERM ("smart shutdown" command), the
postmaster has immediately killed all "optional" background processes,
and subsequently refused to launch new ones while it's waiting for
foreground client processes to exit. No doubt this seemed like an OK
policy at some point; but it's a pretty bad one now, because it makes
for a seriously degraded environment for the remaining clients:
* Parallel queries are killed, and new ones fail to launch. (And our
parallel-query infrastructure utterly fails to deal with the case
in a reasonable way --- it just hangs waiting for workers that are
not going to arrive. There is more work needed in that area IMO.)
* Autovacuum ceases to function. We can tolerate that for awhile,
but if bulk-update queries continue to run in the surviving client
sessions, there's eventually going to be a mess. In the worst case
the system could reach a forced shutdown to prevent XID wraparound.
* The bgwriter and walwriter are also stopped immediately, likely
resulting in performance degradation.
Hence, let's rearrange things so that the only immediate change in
behavior is refusing to let in new normal connections. Once the last
normal connection is gone, shut everything down as though we'd received
a "fast" shutdown. To implement this, remove the PM_WAIT_BACKUP and
PM_WAIT_READONLY states, instead staying in PM_RUN or PM_HOT_STANDBY
while normal connections remain. A subsidiary state variable tracks
whether or not we're letting in new connections in those states.
This also allows having just one copy of the logic for killing child
processes in smart and fast shutdown modes. I moved that logic into
PostmasterStateMachine() by inventing a new state PM_STOP_BACKENDS.
Back-patch to 9.6 where parallel query was added. In principle
this'd be a good idea in 9.5 as well, but the risk/reward ratio
is not as good there, since lack of autovacuum is not a problem
during typical uses of smart shutdown.