Alvaro Herrera [Tue, 17 Mar 2020 19:13:18 +0000 (16:13 -0300)]
Fix consistency issues with replication slot copy
Commit 9f06d79ef831's replication slot copying failed to
properly reserve the WAL that the slot is expecting to see
during DecodingContextFindStartpoint (to set the confirmed_flush
LSN), so concurrent activity could remove that WAL and cause the
copy process to error out. But it doesn't actually *need* that
WAL anyway: instead of running decode to find confirmed_flush, it
can be copied from the source slot. Fix this by rearranging things
to avoid DecodingContextFindStartpoint() (leaving the target slot's
confirmed_flush_lsn to invalid), and set that up afterwards by copying
from the target slot's value.
Also ensure the source slot's confirmed_flush_lsn is valid.
Tom Lane [Tue, 17 Mar 2020 19:05:17 +0000 (15:05 -0400)]
Doc: clarify behavior of "anyrange" pseudo-type.
I noticed that we completely failed to document the restriction
that an "anyrange" result type has to be inferred from an "anyrange"
input. The docs also were less clear than they could be about the
relationship between "anyrange" and "anyarray".
Tom Lane [Tue, 17 Mar 2020 16:09:26 +0000 (12:09 -0400)]
Use pkg-config, if available, to locate libxml2 during configure.
If pkg-config is installed and knows about libxml2, use its information
rather than asking xml2-config. Otherwise proceed as before. This
patch allows "configure --with-libxml" to succeed on platforms that
have pkg-config but not xml2-config, which is likely to soon become
a typical situation.
The old mechanism can be forced by setting XML2_CONFIG explicitly
(hence, build processes that were already doing so will certainly
not need adjustment). Also, it's now possible to set XML2_CFLAGS
and XML2_LIBS explicitly to override both programs.
There is a small risk of this breaking existing build processes,
if there are multiple libxml2 installations on the machine and
pkg-config disagrees with xml2-config about which to use. The
only case where that seems really likely is if a builder has tried
to select a non-default xml2-config by putting it early in his PATH
rather than setting XML2_CONFIG. Plan to warn against that in the
minor release notes.
Back-patch to v10; before that we had no pkg-config infrastructure,
and it doesn't seem worth adding it for this.
Hugh McMaster and Tom Lane; Peter Eisentraut also made an earlier
attempt at this, from which I lifted most of the docs changes.
Tom Lane [Tue, 17 Mar 2020 01:05:29 +0000 (21:05 -0400)]
Avoid holding a directory FD open across assorted SRF calls.
This extends the fixes made in commit 085b6b667 to other SRFs with the
same bug, namely pg_logdir_ls(), pgrowlocks(), pg_timezone_names(),
pg_ls_dir(), and pg_tablespace_databases().
Also adjust various comments and documentation to warn against
expecting to clean up resources during a ValuePerCall SRF's final
call.
Back-patch to all supported branches, since these functions were
all born broken.
Backpatch as far back as it applies cleanly (and as far as as each
function exists). REL_10_STABLE uses different wording, but hopefully
people are not reading docs so old to write new apps anyway.
Tom Lane [Sat, 14 Mar 2020 18:42:22 +0000 (14:42 -0400)]
Restructure polymorphic-type resolution in funcapi.c.
resolve_polymorphic_tupdesc() and resolve_polymorphic_argtypes() failed to
cover the case of having to resolve anyarray given only an anyrange input.
The bug was masked if anyelement was also used (as either input or
output), which probably helps account for our not having noticed.
While looking at this I noticed that resolve_generic_type() would produce
the wrong answer if asked to make that same resolution. ISTM that
resolve_generic_type() is confusingly defined and overly complex, so
rather than fix it, let's just make funcapi.c do the specific lookups
it requires for itself.
With this change, resolve_generic_type() is not used anywhere, so remove
it in HEAD. In the back branches, leave it alone (complete with bug)
just in case any external code is using it.
While we're here, make some other refactoring adjustments in funcapi.c
with an eye to upcoming future expansion of the set of polymorphic types:
* Simplify quick-exit tests by adding an overall have_polymorphic_result
flag. This is about a wash now but will be a win when there are more
flags.
* Reduce duplication of code between resolve_polymorphic_tupdesc() and
resolve_polymorphic_argtypes().
* Don't bother to validate correct matching of anynonarray or anyenum;
the parser should have done that, and even if it didn't, just doing
"return false" here would lead to a very confusing, off-point error
message. (Really, "return false" in these two functions should only
occur if the call_expr isn't supplied or we can't obtain data type
info from it.)
* For the same reason, throw an elog rather than "return false" if
we fail to resolve a polymorphic type.
The bug's been there since we added anyrange, so back-patch to
all supported branches.
Peter Eisentraut [Fri, 13 Mar 2020 10:28:11 +0000 (11:28 +0100)]
Preserve replica identity index across ALTER TABLE rewrite
If an index was explicitly set as replica identity index, this setting
was lost when a table was rewritten by ALTER TABLE. Because this
setting is part of pg_index but actually controlled by ALTER
TABLE (not part of CREATE INDEX, say), we have to do some extra work
to restore it.
Based-on-patch-by: Quan Zongliang Reviewed-by: Euler Taveira
Discussion: https://www.postgresql.org/message-id/flat/c70fcab2-4866-0d9f-1d01-e75e189db342@gmail.com
Tom Lane [Wed, 11 Mar 2020 22:23:57 +0000 (18:23 -0400)]
Fix test case instability introduced in 085b6b667.
I forgot that the WAL directory might hold other files besides WAL
segments, notably including new segments still being filled.
That means a blind test for the first file's size being 16MB can
fail. Restrict based on file name length to make it more robust.
Peter Geoghegan [Wed, 11 Mar 2020 21:15:02 +0000 (14:15 -0700)]
Paper over bt_metap() oldest_xact bug in backbranches.
The data types that contrib/pageinspect's bt_metap() function were
declared to return as OUT arguments were wrong in some cases. In
particular, the oldest_xact column (a TransactionId/xid field) was
declared integer/int4 within the pageinspect extension's sql file. This
led to errors when an oldest_xact value that exceeded 2^31-1 was
encountered.
We cannot fix the declaration on Postgres 11 or 12. All we can do is
ameliorate the problem. Use "%d" instead of "%u" to format the output
of the oldest_xact value. This makes the C code match the declaration,
suppressing unhelpful error messages that might otherwise make
bt_metap() totally unusable. A bogus negative oldest_xact value will be
displayed instead of raising an error.
This commit addresses the same issue as master branch commit 691e8b2e18,
which actually fixed the problem. Backpatch to the 11 and 12 branches
only, since they are the only branches (other than master) that have
oldest_xact. All of the other problematic columns already display bogus
output for out of range values.
Reported-By: Victor Yegorov
Bug: #16285
Discussion: https://postgr.es/m/20200309223557[email protected]
Backpatch: 11 and 12 only
Alvaro Herrera [Wed, 11 Mar 2020 19:54:54 +0000 (16:54 -0300)]
Add pg_dump support for ALTER obj DEPENDS ON EXTENSION
pg_dump is oblivious to this kind of dependency, so they're lost on
dump/restores (and pg_upgrade). Have pg_dump emit ALTER lines so that
they're preserved. Add some pg_dump tests for the whole thing, also.
Reviewed-by: Tom Lane (offlist) Reviewed-by: Ibrar Ahmed Reviewed-by: Ahsan Hadi (who also reviewed commit 899a04f5ed61)
Discussion: https://postgr.es/m/20200217225333[email protected]
Tom Lane [Wed, 11 Mar 2020 19:27:59 +0000 (15:27 -0400)]
Avoid holding a directory FD open across pg_ls_dir_files() calls.
This coding technique is undesirable because (a) it leaks the FD for
the rest of the transaction if the SRF is not run to completion, and
(b) allocated FDs are a scarce resource, but multiple interleaved
uses of the relevant functions could eat many such FDs.
In v11 and later, a query such as "SELECT pg_ls_waldir() LIMIT 1"
yields a warning about the leaked FD, and the only reason there's
no warning in earlier branches is that fd.c didn't whine about such
leaks before commit 9cb7db3f0. Even disregarding the warning, it
wouldn't be too hard to run a backend out of FDs with careless use
of these SQL functions.
Hence, rewrite the function so that it reads the directory within
a single call, returning the results as a tuplestore rather than
via value-per-call mode.
There are half a dozen other built-in SRFs with similar problems,
but let's fix this one to start with, just to see if the buildfarm
finds anything wrong with the code.
In passing, fix bogus error report for stat() failure: it was
whining about the directory when it should be fingering the
individual file. Doubtless a copy-and-paste error.
Back-patch to v10 where this function was added.
Justin Pryzby, with cosmetic tweaks and test cases by me
Alvaro Herrera [Wed, 11 Mar 2020 14:04:59 +0000 (11:04 -0300)]
Avoid duplicates in ALTER ... DEPENDS ON EXTENSION
If the command is attempted for an extension that the object already
depends on, silently do nothing.
In particular, this means that if a database containing multiple such
entries is dumped, the restore will silently do the right thing and
record just the first one. (At least, in a world where pg_dump does
dump such entries -- which it doesn't currently, but it will.)
Backpatch to 9.6, where this kind of dependency was introduced.
Michael Paquier [Tue, 10 Mar 2020 06:38:34 +0000 (15:38 +0900)]
Prevent reindex of invalid indexes on TOAST tables
Such indexes can only be duplicated leftovers of a previously failed
REINDEX CONCURRENTLY command, and a valid equivalent is guaranteed to
exist. As toast indexes can only be dropped if invalid, reindexing
these would lead to useless duplicated indexes that can't be dropped
anymore, except if the parent relation is dropped.
Thanks to Justin Pryzby for reminding that this problem was reported
long ago during the review of the original patch of REINDEX
CONCURRENTLY, but the issue was never addressed.
Reported-by: Sergei Kornilov, Justin Pryzby
Author: Julien Rouhaud Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/36712441546604286%40sas1-890ba5c2334a.qloud-c.yandex.net
Discussion: https://postgr.es/m/20200216190835[email protected]
Backpatch-through: 12
Tom Lane [Mon, 9 Mar 2020 18:58:11 +0000 (14:58 -0400)]
Fix pg_dump/pg_restore to restore event triggers later.
Previously, event triggers were restored just after regular triggers
(and FK constraints, which are basically triggers). This is risky
since an event trigger, once installed, could interfere with subsequent
restore commands. Worse, because event triggers don't have any
particular dependencies on any post-data objects, a parallel restore
would consider them eligible to be restored the moment the post-data
phase starts, allowing them to also interfere with restoration of a
whole bunch of objects that would have been restored before them in
a serial restore. There's no way to completely remove the risk of a
misguided event trigger breaking the restore, since if nothing else
it could break other event triggers. But we can certainly push them
to later in the process to minimize the hazard.
To fix, tweak the RestorePass mechanism introduced by commit 3eb9a5e7c
so that event triggers are handled as part of the post-ACL processing
pass (renaming the "REFRESH" pass to "POST_ACL" to reflect its more
general use). This will cause them to restore after everything except
matview refreshes, which seems OK since matview refreshes really ought
to run in the post-restore state of the database. In a parallel
restore, event triggers and matview refreshes might be intermixed,
but that seems all right as well.
Also update the code and comments in pg_dump_sort.c so that its idea
of how things are sorted agrees with what actually happens due to
the RestorePass mechanism. This is mostly cosmetic: it'll affect the
order of objects in a dump's TOC, but not the actual restore order.
But not changing that would be quite confusing to somebody reading
the code.
Fujii Masao [Mon, 9 Mar 2020 15:14:43 +0000 (00:14 +0900)]
Fix bug that causes to report waiting in PS display twice, in hot standby.
Previously "waiting" could appear twice via PS in case of lock conflict
in hot standby mode. Specifically this issue happend when the delay
in WAL application determined by max_standby_archive_delay and
max_standby_streaming_delay had passed but it took more than 500 msec
to cancel all the conflicting transactions. Especially we can observe this
easily by setting those delay parameters to -1.
The cause of this issue was that WaitOnLock() and
ResolveRecoveryConflictWithVirtualXIDs() added "waiting" to
the process title in that case. This commit prevents
ResolveRecoveryConflictWithVirtualXIDs() from reporting waiting
in case of lock conflict, to fix the bug.
Fujii Masao [Mon, 9 Mar 2020 06:31:31 +0000 (15:31 +0900)]
Avoid assertion failure with targeted recovery in standby mode.
At the end of recovery, standby mode is turned off to re-fetch the last
valid record from archive or pg_wal. Previously, if recovery target was
reached and standby mode was turned off while the current WAL source
was stream, recovery could try to retrieve WAL file containing the last
valid record unexpectedly from stream even though not in standby mode.
This caused an assertion failure. That is, the assertion test confirms that
WAL file should not be retrieved from stream if standby mode is not true.
This commit moves back the current WAL source to archive if it's stream
even though not in standby mode, to avoid that assertion failure.
This issue doesn't cause the server to crash when built with assertion
disabled. In this case, the attempt to retrieve WAL file from stream not
in standby mode just fails. And then recovery tries to retrieve WAL file
from archive or pg_wal.
Michael Paquier [Mon, 9 Mar 2020 01:54:55 +0000 (10:54 +0900)]
Doc: fix some description of environment variables with frontend tools
This addresses a couple of issues in the documentation:
- Description of PG_COLOR was missing for some tools (pg_archivecleanup
and pg_test_fsync), while the other descriptions had grammar mistakes.
- pgbench supports more environment variables: PGUSER, PGHOST and
PGPORT.
- vacuumlo, oid2name and pgbench support coloring (HEAD only)
Author: Michael Paquier Reviewed-by: Fabien Coelho, Daniel Gustafsson, Juan José Santamaría
Flecha
Discussion: https://postgr.es/m/20200304075418[email protected]
Backpatch-through: 12
Michael Paquier [Thu, 5 Mar 2020 03:50:23 +0000 (12:50 +0900)]
Fix more issues with dependency handling at swap phase of REINDEX CONCURRENTLY
When canceling a REINDEX CONCURRENTLY operation after swapping is done,
a drop of the parent table would leave behind old indexes. This is a
consequence of 68ac9cf, which fixed the case of pg_depend bloat when
repeating REINDEX CONCURRENTLY on the same relation.
In order to take care of the problem without breaking the previous fix,
this uses a different strategy, possible even with the exiting set of
routines to handle dependency changes. The dependencies of/on the
new index are additionally switched to the old one, allowing an old
invalid index remaining around because of a cancellation or a failure to
use the dependency links of the concurrently-created index. This
ensures that dropping any objects the old invalid index depends on also
drops the old index automatically.
Michael Paquier [Tue, 3 Mar 2020 04:56:11 +0000 (13:56 +0900)]
Fix assertion failure with ALTER TABLE ATTACH PARTITION and indexes
Using ALTER TABLE ATTACH PARTITION causes an assertion failure when
attempting to work on a partitioned index, because partitioned indexes
cannot have partition bounds.
The grammar of ALTER TABLE ATTACH PARTITION requires partition bounds,
but not ALTER INDEX, so mixing ALTER TABLE with partitioned indexes is
confusing. Hence, on HEAD, prevent ALTER TABLE to attach a partition if
the relation involved is a partitioned index. On back-branches, as
applications may rely on the existing behavior, just remove the
culprit assertion.
Reported-by: Alexander Lakhin
Author: Amit Langote, Michael Paquier
Discussion: https://postgr.es/m/16276-5cd1dcc8fb8be7b5@postgresql.org
Backpatch-through: 11
Michael Paquier [Tue, 3 Mar 2020 01:12:49 +0000 (10:12 +0900)]
Preserve pg_index.indisclustered across REINDEX CONCURRENTLY
If the flag value is lost, a CLUSTER query following REINDEX
CONCURRENTLY could fail. Non-concurrent REINDEX is already handling
this case consistently.
Michael Paquier [Mon, 2 Mar 2020 06:46:24 +0000 (15:46 +0900)]
Fix command-line colorization on Windows with VT100-compatible environments
When setting PG_COLOR to "always" or "auto" in a Windows terminal
VT100-compatible, the colorization output was not showing up correctly
because it is necessary to update the console's output handling mode.
This fix allows to detect automatically if the environment is compatible
with VT100. Hence, PG_COLOR=auto is able to detect and handle both
compatible and non-compatible environments. The behavior of
PG_COLOR=always remains unchanged, as it enforces the use of colorized
output even if the environment does not allow it.
This fix is based on an initial suggestion from Thomas Munro.
Reported-by: Haiying Tang
Author: Juan José Santamaría Flecha Reviewed-by: Michail Nikolaev, Michael Paquier, Haiying Tang
Discussion: https://postgr.es/m/16108-134692e97146b7bc@postgresql.org
Backpatch-through: 12
Tom Lane [Sat, 29 Feb 2020 18:48:10 +0000 (13:48 -0500)]
Correctly re-use hash tables in buildSubPlanHash().
Commit 356687bd8 omitted to remove leftover code for destroying
a hashed subplan's hash tables, with the result that the tables
were always rebuilt not reused; this leads to severe memory
leakage if a hashed subplan is re-executed enough times.
Moreover, the code for reusing the hashnulls table had a typo
that would have made it do the wrong thing if it were reached.
Looking at the code coverage report shows severe under-coverage
of the potential callers of ResetTupleHashTable, so add some test
cases that exercise them.
Andreas Karlsson and Tom Lane, per reports from Ranier Vilela
and Justin Pryzby.
Tom Lane [Sat, 29 Feb 2020 01:28:34 +0000 (20:28 -0500)]
Avoid failure if autovacuum tries to access a just-dropped temp namespace.
Such an access became possible when commit 246a6c8f7 added more
aggressive cleanup of orphaned temp relations by autovacuum.
Since autovacuum's snapshot might be slightly stale, it could
attempt to access an already-dropped temp namespace, resulting in
an assertion failure or null-pointer dereference. (In practice,
since we don't drop temp namespaces automatically but merely
recycle them, this situation could only arise if a superuser does
a manual drop of a temp namespace. Still, that should be allowed.)
The core of the bug, IMO, is that isTempNamespaceInUse and its callers
failed to think hard about whether to treat "temp namespace isn't there"
differently from "temp namespace isn't in use". In hopes of forestalling
future mistakes of the same ilk, replace that function with a new one
checkTempNamespaceStatus, which makes the same tests but returns a
three-way enum rather than just a bool. isTempNamespaceInUse is gone
entirely in HEAD; but just in case some external code is relying on it,
keep it in the back branches, as a bug-compatible wrapper around the
new function.
Per report originally from Prabhat Kumar Sahu, investigated by Mahendra
Singh and Michael Paquier; the final form of the patch is my fault.
This replaces the failed fix attempt in a052f6cbb.
Tom Lane [Fri, 28 Feb 2020 16:29:58 +0000 (11:29 -0500)]
Doc: correct thinko in pg_buffercache documentation.
Access to this module is granted to the pg_monitor role, not
pg_read_all_stats. (Given the view's performance impact,
it seems wise to be restrictive, so I think this was the
correct decision --- and anyway it was clearly intentional.)
Michael Paquier [Thu, 27 Feb 2020 12:58:45 +0000 (21:58 +0900)]
Remove TAP test for createdb --lc-ctype
OpenBSD falls back to "C" when using an incorrect input with setlocale()
and LC_CTYPE, causing this test, introduced by 008cf04, to fail. This
removes the culprit test to avoid the portability issue.
Per report from Robert Haas, via buildfarm member curculio.
Michael Paquier [Thu, 27 Feb 2020 06:31:48 +0000 (15:31 +0900)]
Skip foreign tablespaces when running pg_checksums/pg_verify_checksums
Attempting to use pg_checksums (pg_verify_checksums in 11) on a data
folder which includes tablespace paths used across multiple major
versions would cause pg_checksums to scan all directories present in
pg_tblspc, and not only marked with TABLESPACE_VERSION_DIRECTORY. This
could lead to failures when for example running sanity checks on an
upgraded instance with --check. Even worse, it was possible to rewrite
on-disk pages with --enable for a cluster potentially online.
This commit makes pg_checksums skip any directories not named
TABLESPACE_VERSION_DIRECTORY, similarly to what is done for base
backups.
Michael Paquier [Thu, 27 Feb 2020 02:21:00 +0000 (11:21 +0900)]
createdb: Fix quoting of --encoding, --lc-ctype and --lc-collate
The original coding failed to properly quote those arguments, leading to
failures when using quotes in the values used. As the quoting can be
encoding-sensitive, the connection to the backend needs to be taken
before applying the correct quoting.
Author: Michael Paquier Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20200214041004[email protected]
Backpatch-through: 9.5
Tom Lane [Wed, 26 Feb 2020 23:13:58 +0000 (18:13 -0500)]
Suppress unnecessary RelabelType nodes in more cases.
eval_const_expressions sometimes produced RelabelType nodes that
were useless because they just relabeled an expression to the same
exposed type it already had. This is worth avoiding because it can
cause two equivalent expressions to not be equal(), preventing
recognition of useful optimizations. In the test case added here,
an unpatched planner fails to notice that the "sqli = constant" clause
renders a sort step unnecessary, because one code path produces an
extra RelabelType and another doesn't.
Fix by ensuring that eval_const_expressions_mutator's T_RelabelType
case will not add in an unnecessary RelabelType. Also save some
code by sharing a subroutine with the effectively-equivalent cases
for CollateExpr and CoerceToDomain. (CollateExpr had no bug, and
I think that the case couldn't arise with CoerceToDomain, but
it seems prudent to do the same check for all three cases.)
Back-patch to v12. In principle this has been wrong all along,
but I haven't seen a case where it causes visible misbehavior
before v12, so refrain from changing stable branches unnecessarily.
Alvaro Herrera [Wed, 26 Feb 2020 22:57:14 +0000 (19:57 -0300)]
Fix docs regarding AFTER triggers on partitioned tables
In commit 86f575948c77 I forgot to update the trigger.sgml paragraph
that needs to explain that AFTER triggers are allowed in partitioned
tables. Do so now.
Michael Paquier [Mon, 24 Feb 2020 09:14:16 +0000 (18:14 +0900)]
Add prefix checks in exclude lists for pg_rewind, pg_checksums and base backups
An instance of PostgreSQL crashing with a bad timing could leave behind
temporary pg_internal.init files, potentially causing failures when
verifying checksums. As the same exclusion lists are used between
pg_rewind, pg_checksums and basebackup.c, all those tools are extended
with prefix checks to keep everything in sync, with dedicated checks
added for pg_internal.init.
Backpatch down to 11, where pg_checksums (pg_verify_checksums in 11) and
checksum verification for base backups have been introduced.
Michael Paquier [Fri, 21 Feb 2020 03:05:36 +0000 (12:05 +0900)]
Doc: Fix instructions to control build environment with MSVC
The documentation included some outdated instructions to change the
architecture, build type or target OS of a build done with MSVC. This
commit updates the documentation to include the modern options
available, down to Visual Studio 2013.
Reported-by: Kyotaro Horiguchi
Author: Juan José Santamaría Flecha
Discussion: https://postgr.es/m/CAC+AXB0J7tAqW_2F1fCE4Dh2=Ccz96TcLpsGXOCvka7VvWG9Qw@mail.gmail.com
Backpatch-through: 12
Alvaro Herrera [Thu, 20 Feb 2020 17:14:20 +0000 (14:14 -0300)]
Simplify FK-to-partitioned regression test query
Avoid a join between relations having the FK to detect FK violation.
The planner might optimize this considering the PK must exist on the
referenced side at some point, effectively masking a bug this test
tries to detect.
Tom Lane and Jehan-Guillaume de Rorthais
Discussion: https://postgr.es/m/467.1581270529@sss.pgh.pa.us
Tom Lane [Wed, 19 Feb 2020 19:44:58 +0000 (14:44 -0500)]
Fix confusion about event trigger vs. plain function in plpgsql.
The function hash table keys made by compute_function_hashkey() failed
to distinguish event-trigger call context from regular call context.
This meant that once we'd successfully made a hash entry for an event
trigger (either by validation, or by normal use as an event trigger),
an attempt to call the trigger function as a plain function would
find this hash entry and thereby bypass the you-can't-do-that check in
do_compile(). Thus we'd attempt to execute the function, leading to
strange errors or even crashes, depending on function contents and
server version.
To fix, add an isEventTrigger field to PLpgSQL_func_hashkey,
paralleling the longstanding infrastructure for regular triggers.
This fits into what had been pad space, so there's no risk of an ABI
break, even assuming that any third-party code is looking at these
hash keys. (I considered replacing isTrigger with a PLpgSQL_trigtype
enum field, but felt that that carried some API/ABI risk. Maybe we
should change it in HEAD though.)
Per bug #16266 from Alexander Lakhin. This has been broken since
event triggers were invented, so back-patch to all supported branches.
Fujii Masao [Wed, 19 Feb 2020 11:37:26 +0000 (20:37 +0900)]
Fix mesurement of elapsed time during truncating heap in VACUUM.
VACUUM may truncate heap in several batches. The activity report
is logged for each batch, and contains the number of pages in the table
before and after the truncation, and also the elapsed time during
the truncation. Previously the elapsed time reported in each batch was
the total elapsed time since starting the truncation until finishing
each batch. For example, if the truncation was processed dividing into
three batches, the second batch reported the accumulated time elapsed
during both first and second batches. This is strange and confusing
because the number of pages in the table reported together is not
total. Instead, each batch should report the time elapsed during
only that batch.
The cause of this issue was that the resource usage snapshot was
initialized only at the beginning of the truncation and was never
reset later. This commit fixes the issue by changing VACUUM so that
the resource usage snapshot is reset at each batch.
Amit Kapila [Wed, 19 Feb 2020 02:57:15 +0000 (08:27 +0530)]
Stop demanding that top xact must be seen before subxact in decoding.
Manifested as
ERROR: subtransaction logged without previous top-level txn record
this check forbids legit behaviours like
- First xl_xact_assignment record is beyond reading, i.e. earlier
restart_lsn.
- After restart_lsn there is some change of a subxact.
- After that, there is second xl_xact_assignment (for another subxact)
revealing the relationship between top and first subxact.
Such a transaction won't be streamed anyway because we hadn't seen it in
full. Saying for sure whether xact of some record encountered after
the snapshot was deserialized can be streamed or not requires to know
whether it wrote something before deserialization point --if yes, it
hasn't been seen in full and can't be decoded. Snapshot doesn't have such
info, so there is no easy way to relax the check.
Reported-by: Hsu, John Diagnosed-by: Arseny Sher
Author: Arseny Sher, Amit Kapila Reviewed-by: Amit Kapila, Dilip Kumar
Backpatch-through: 9.5
Discussion: https://postgr.es/m/AB5978B2-1772-4FEE-A245-74C91704ECB0@amazon.com
Tom Lane [Mon, 17 Feb 2020 23:40:02 +0000 (18:40 -0500)]
Teach pg_dump to dump comments on RLS policy objects.
This was unaccountably omitted in the original RLS patch.
The SQL syntax is basically the same as for comments on triggers,
so crib code from dumpTrigger().
Per report from Marc Munro. Back-patch to all supported branches.
Peter Eisentraut [Mon, 17 Feb 2020 14:19:58 +0000 (15:19 +0100)]
Fill in extraUpdatedCols in logical replication
The extraUpdatedCols field of the target RTE records which generated
columns are affected by an update. This is used in a variety of
places, including per-column triggers and foreign data wrappers. When
an update was initiated by a logical replication subscription, this
field was not filled in, so such an update would not affect generated
columns in a way that is consistent with normal updates. To fix,
factor out some code from analyze.c to fill in extraUpdatedCols in the
logical replication worker as well.
Reviewed-by: Pavel Stehule
Discussion: https://www.postgresql.org/message-id/flat/b05e781a-fa16-6b52-6738-761181204567@2ndquadrant.com
Fujii Masao [Mon, 17 Feb 2020 07:16:08 +0000 (16:16 +0900)]
Add description about GSSOpenServer wait event into document.
This commit also updates wait event enum into alphabetical order.
Previously the enum entry for GSSOpenServer was added out-of-order.
Back-patch to v12 where commit b0b39f72b9 introduced
GSSOpenServer wait event. In v12, the commit doesn't include
the update of wait event enum, not to break ABI.
Author: Fujii Masao Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/949931aa-4ed4-d867-a7b5-de9c02b2292b@oss.nttdata.com
Tom Lane [Sun, 16 Feb 2020 17:20:18 +0000 (12:20 -0500)]
Try again to work around Windows' ERROR_SHARING_VIOLATION in pg_ctl.
Commit 0da33c762 introduced an unfortunate regression in pg_ctl on
Windows: if the log file specified with -l doesn't exist yet, and
pg_ctl is running with Administrator privileges, then the log file
might get created with permissions that prevent the postmaster from
writing on it. (It seems that whether this happens depends on whether
the log file is inside the user's home directory or not, and perhaps
on other phase-of-the-moon conditions, which may explain why we failed
to notice it sooner.)
To fix, just don't create the log file if it doesn't exist yet. The
case where we need to wait obviously only occurs with a pre-existing
log file.
In passing, switch from using fopen() to plain open(), saving a few
cycles.
Per bug #16259 from Jonathan Katz and Heath Lord. Back-patch to v12,
as the faulty commit was.
Tom Lane [Thu, 13 Feb 2020 18:37:43 +0000 (13:37 -0500)]
Avoid a performance regression in float overflow/underflow detection.
Commit 6bf0bc842 replaced float.c's CHECKFLOATVAL() macro with static
inline subroutines, but that wasn't too well thought out. In the original
coding, the unlikely condition (isinf(result) or result == 0) was checked
first, and the inf_is_valid or zero_is_valid condition only afterwards.
The inline-subroutine coding caused that to be swapped around, which is
pretty horrid for performance because (a) in common cases the is_valid
condition is twice as expensive to evaluate (e.g., requiring two isinf()
calls not one) and (b) in common cases the is_valid condition is false,
requiring us to perform the unlikely-condition check anyway. Net result
is that one isinf() call becomes two or three, resulting in visible
performance loss as reported by Keisuke Kuroda.
The original fix proposal was to revert the replacement of the macro,
but on second thought, that macro was just a bad idea from the beginning:
if anything it's a net negative for readability of the code. So instead,
let's just open-code all the overflow/underflow tests, being careful to
test the unlikely condition first (and mark it unlikely() to help the
compiler get the point).
Also, rather than having N copies of the actual ereport() calls, collapse
those into out-of-line error subroutines to save some code space. This
does mean that the error file/line numbers won't be very helpful for
figuring out where the issue really is --- but we'd already burned that
bridge by putting the ereports into static inlines.
In HEAD, check_float[48]_val() are gone altogether. In v12, leave them
present in float.h but unused in the core code, just in case some
extension is depending on them.
Emre Hasegeli, with some kibitzing from me and Andres Freund
Tom Lane [Wed, 12 Feb 2020 19:13:13 +0000 (14:13 -0500)]
Doc: fix old oversights in GRANT/REVOKE documentation.
The GRANTED BY clause in GRANT/REVOKE ROLE has been there since 2005
but was never documented. I'm not sure now whether that was just an
oversight or was intentional (given the limited capability of the
option). But seeing that pg_dumpall does emit code that uses this
option, it seems like not documenting it at all is a bad idea.
Also, when we upgraded the syntax to allow CURRENT_USER/SESSION_USER
as the privilege recipient, the role form of GRANT was incorrectly
not modified to show that, and REVOKE's docs weren't touched at all.
Although I'm not that excited about GRANTED BY, the other oversight
seems serious enough to justify a back-patch.
Michael Paquier [Mon, 10 Feb 2020 01:49:38 +0000 (10:49 +0900)]
pg_upgrade: Fix quoting of some arguments in pg_ctl command
The previous coding forgot to apply shell quoting to the socket
directory and the data folder, leading to failures when running
pg_upgrade. This refactors the code generating the pg_ctl command
starting clusters to use a more correct shell quoting. Failures are
easier to trigger in 12 and newer versions by using a value of
--socketdir that includes quotes, but it is also possible to cause
failures with quotes included in the default socket directory used by
pg_upgrade or the data folders of the clusters involved in the
upgrade.
As 9.4 is going to be EOL'd with the next minor release, nobody is
likely going to upgrade to it now so this branch is not included in the
set of branches fixed.
Author: Michael Paquier Reviewed-by: Álvaro Herrera, Noah Misch
Backpatch-through: 9.5
Tom Lane [Sun, 9 Feb 2020 17:02:57 +0000 (12:02 -0500)]
Store the deletion horizon XID for a deleted GIN page on the right page.
Commit b10714080 moved the GinPageSetDeleteXid() call to a spot where
the "page" variable was pointing to the wrong page, causing the XID
to be inserted on a page that's not being deleted, thus allowing later
GinPageIsRecyclable tests to recycle the deleted page too soon.
It might be a good idea to stop using the single "page" variable for
multiple purposes in this function. But for the moment I just moved
the GinPageSetDeleteXid() call down beside the GinPageSetDeleted()
call, which seems like a more logical place for it anyway.
Back-patch to v11, as the faulty patch was. (Fortunately, the bug
hasn't made it into any release yet.)
Fujii Masao [Sat, 8 Feb 2020 03:29:38 +0000 (12:29 +0900)]
Revert recent change of ScanOption values and renumber SO_TYPE_TIDSCAN.
Commit 598b466e80 in v12 renumbered ScanOption enum values to add
the option for Tid scan. But the change of the codes assigned to
the existing values caused an ABI break and may break some extensions
depending on them. This should be avoided in minor version.
This commit reverts the renumbering of ScanOption enum values made by
commit 598b466e80 in v12 and put the ScanOption for Tid scan with new value.
This is applied only to v12.
Alvaro Herrera [Fri, 7 Feb 2020 21:27:18 +0000 (18:27 -0300)]
Fix failure to create FKs correctly in partitions
On a multi-level partioned table, when adding a partition not directly
connected to the root table, foreign key constraints referencing the
root were not cloned to the new partition, leading to the FK being
possibly inadvertently violated later on.
This was caused by fuzzy thinking in CloneFkReferenced (commit f56f8f8da6af): it was skipping constraints marked as having parents on
the theory that cloning those would create duplicates; but that's only
correct for the top level of the partitioning hierarchy. For levels
below that one, such constraints must still be considered and only
skipped if later on we see that we'd create duplicates. Apparently, I
(Álvaro) wrote the comments right but the code implemented something
slightly different.
Author: Jehan-Guillaume de Rorthais
Discussion: https://postgr.es/m/20200206004948.238352db@firost
Alvaro Herrera [Fri, 7 Feb 2020 20:09:36 +0000 (17:09 -0300)]
Fix TRUNCATE .. CASCADE on partitions
When running TRUNCATE CASCADE on a child of a partitioned table
referenced by another partitioned table, the truncate was not applied to
partitions of the referencing table; this could leave rows violating the
constraint in the referencing partitioned table. Repair by walking the
pg_constraint chain all the way up to the topmost referencing table.
Note: any partitioned tables containing FKs that reference other
partitioned tables should be checked for possible violating rows, if
TRUNCATE has occurred in partitions of the referenced table.
Reported-by: Christophe Courtois
Author: Jehan-Guillaume de Rorthais
Discussion: https://postgr.es/m/20200204183906.115f693e@firost
Fujii Masao [Fri, 7 Feb 2020 13:00:21 +0000 (22:00 +0900)]
Fix bug in Tid scan.
Commit 147e3722f7 changed Tid scan so that it calls table_beginscan()
and uses the scan option for seq scan. This change caused two issues.
(1) The change caused Tid scan to take a predicate lock on the entire
relation in serializable transaction even when relation-level
lock is not necessary. This could lead to an unexpected
serialization error.
(2) The change caused Tid scan to increment the number of seq_scan
in pg_stat_*_tables views even though it's not seq scan. This
could confuse the users.
This commit adds the scan option for Tid scan and makes Tid scan
use it, to avoid those issues.
Michael Paquier [Thu, 6 Feb 2020 23:11:01 +0000 (08:11 +0900)]
Revert "Add GUC checks for ssl_min_protocol_version and ssl_max_protocol_version"
This reverts commit 41aadee, as the GUC checks could run on older values
with the new values used, and result in incorrect errors if both
parameters are changed at the same time.
Fujii Masao [Thu, 6 Feb 2020 15:33:11 +0000 (00:33 +0900)]
Add note about access permission checks by inherited TRUNCATE and LOCK TABLE.
Inherited queries perform access permission checks on the parent
table only. But there are two exceptions to this rule in v12 or before;
TRUNCATE and LOCK TABLE commands through a parent table check
the permissions on not only the parent table but also the children
tables. Previously these exceptions were not documented.
This commit adds the note about these exceptions, into the document.
Back-patch to v9.4. But we don't apply this commit to the master
because commit e6f1e560e4 already got rid of the exception about
inherited TRUNCATE and upcoming commit will do for the exception
about inherited LOCK TABLE.
Author: Amit Langote Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CA+HiwqHfTnMU6SUkyHxCmpHUKk7ERLHCR3vZVq19ZOQBjPBLmQ@mail.gmail.com
Amit Kapila [Thu, 6 Feb 2020 10:30:54 +0000 (16:00 +0530)]
Fix typo.
Reported-by: Amit Langote
Author: Amit Langote
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CA+HiwqFNADeukaaGRmTqANbed9Fd81gLi08AWe_F86_942Gspw@mail.gmail.com
Fujii Masao [Thu, 6 Feb 2020 05:43:21 +0000 (14:43 +0900)]
Fix bug in LWLock statistics mechanism.
Previously PostgreSQL built with -DLWLOCK_STATS could report
more than one LWLock statistics entries for the same backend
process and the same LWLock. This is strange and only one
statistics should be output in that case, instead.
The cause of this issue is that the key variable used for
LWLock stats hash table was not fully initialized. The key
consists of two fields and they were initialized. But
the following 4 bytes allocated in the key variable for
the alignment was not initialized. So even if the same key
was specified, hash_search(HASH_ENTER) could not find
the existing entry for that key and created new one.
This commit fixes this issue by initializing the key
variable with zero. As the side effect of this commit,
the volume of LWLock statistics output would be reduced
very much.
Back-patch to v10, where commit 3761fe3c20 introduced the issue.
Andrew Gierth [Wed, 5 Feb 2020 19:49:47 +0000 (19:49 +0000)]
Force tuple conversion when the source has missing attributes.
Tuple conversion incorrectly concluded that no conversion was needed
as long as all the attributes lined up. But if the source tuple has a
missing attribute (from addition of a column with default), then the
destination tupdesc might not reflect the same default. The typical
symptom was that the affected columns would be unexpectedly NULL.
Repair by always forcing conversion if the source has missing
attributes, which will be filled in by the deform operation. (In
theory we could optimize for when the destination has the same
default, but that seemed overkill.)
Backpatch to 11 where missing attributes were added.
Per bug #16242.
Vik Fearing (discovery, code, testing) and me (analysis, testcase).
Alvaro Herrera [Wed, 5 Feb 2020 18:06:11 +0000 (15:06 -0300)]
ALTER SUBSCRIPTION / REFRESH docs: explain copy_data
The docs are ambiguous as to which tables would be copied over when the
copy_data parameter is true in ALTER SUBSCRIPTION ... REFRESH PUBLICATION.
Make it clear that it only applies to tables which are new in the
publication.
Author: David Christensen (reword by Álvaro Herrera)
Discussion: https://postgr.es/m/95339420-7F09-4F8C-ACC0-8F1CFAAD9CD7@endpoint.com
Noah Misch [Wed, 5 Feb 2020 16:26:41 +0000 (08:26 -0800)]
When a TAP file has non-zero exit status, retain temporary directories.
PostgresNode already retained base directories in such cases. Stop
using $SIG{__DIE__}, which is redundant with the exit status check, in
lieu of proliferating it to TestLib. Back-patch to 9.6, where commit 88802e068017bee8cea7a5502a712794e761c7b5 introduced retention on
failure.
Fujii Masao [Thu, 26 Dec 2019 06:07:43 +0000 (15:07 +0900)]
Add note about how each partition's default value is treated, into the doc.
Column defaults may be specified separately for each partition.
But INSERT via a partitioned table ignores those partition's default values.
The former is documented, but the latter restriction not.
This commit adds the note about that restriction into the document.
Back-patch to v10 where partitioning was introduced.
Author: Fujii Masao Reviewed-by: Amit Langote
Discussion: https://postgr.es/m/CAHGQGwEs-59omrfGF7hOHz9iMME3RbKy5ny+iftDx3LHTEn9sA@mail.gmail.com
Thomas Munro [Tue, 4 Feb 2020 23:21:03 +0000 (12:21 +1300)]
Handle lack of DSM slots in parallel btree build, take 2.
Commit 74618e77 added a new check intended to fix a bug, but put
it in the wrong place so that parallel btree build was always
disabled. Do the check after we've actually tried to create
a DSM segment. Back-patch to 11, like the earlier commit.
Reviewed-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-WzmDABkJzrNnvf%2BOULK-_A_j9gkYg_Dz-H62jzNv4eKQTw%40mail.gmail.com
Tom Lane [Tue, 4 Feb 2020 18:07:13 +0000 (13:07 -0500)]
Fix handling of "Subplans Removed" field in EXPLAIN output.
Commit 499be013d added this field in a rather poorly-thought-through
manner, with the result being that rather than being a field of the
Append or MergeAppend plan node as intended (and as it seems to be,
in text format), it was actually an element of the "Plans" subgroup.
At least in JSON format, that's flat out invalid syntax, because
"Plans" is an array not an object.
While it's not hard to move the generation of the field so that it
appears where it's supposed to, this does result in a visible change
in field order in text format, in cases where a Append or MergeAppend
plan node has any InitPlans attached. That's slightly annoying to
do in stable branches; but the alternative of continuing to emit
broken non-text formats seems worse.
Also, since the set of fields emitted is not supposed to be
data-dependent in non-text formats, make sure that "Subplans Removed"
appears in Append and MergeAppend nodes even when it's zero, in those
formats. (The previous coding made it look like it could appear in
some other node types such as BitmapAnd, but we don't actually support
runtime pruning there, so don't emit it in those cases.)
Per bug #16171 from Mahadevan Ramachandran. Fix by Daniel Gustafsson
and Tom Lane, reviewed by Hamid Akhtar. Back-patch to v11 where this
code came in.
Alvaro Herrera [Mon, 3 Feb 2020 21:59:12 +0000 (18:59 -0300)]
Add missing break out seqscan loop in logical replication
When replica identity is FULL (an admittedly unusual case), the loop
that searches for tuples in execReplication.c didn't stop scanning the
table when once a matching tuple was found. Add the missing 'break'.
Note slight behavior change: we now return the first matching tuple
rather than the last one. They are supposed to be indistinguishable
anyway, so this shouldn't matter.
Author: Konstantin Knizhnik
Discussion: https://postgr.es/m/379743f6-ae91-b866-f7a2-5624e6d2b0a4@postgrespro.ru
This commit reverts the fix "Make inherited TRUNCATE perform access
permission checks on parent table only" only in the back branches.
It's not hard to imagine that there are some applications expecting
the old behavior and the fix breaks their security. To avoid this
compatibility problem, we decided to apply the fix only in HEAD and
revert it in all supported back branches.
Thomas Munro [Sat, 1 Feb 2020 01:29:13 +0000 (14:29 +1300)]
Fix memory leak on DSM slot exhaustion.
If we attempt to create a DSM segment when no slots are available,
we should return the memory to the operating system. Previously
we did that if the DSM_CREATE_NULL_IF_MAXSEGMENTS flag was
passed in, but we didn't do it if an error was raised. Repair.
Back-patch to 9.4, where DSM segments arrived.
Author: Thomas Munro Reviewed-by: Robert Haas Reported-by: Julian Backes
Discussion: https://postgr.es/m/CA%2BhUKGKAAoEw-R4om0d2YM4eqT1eGEi6%3DQot-3ceDR-SLiWVDw%40mail.gmail.com
Tom Lane [Fri, 31 Jan 2020 22:03:55 +0000 (17:03 -0500)]
Fix CheckAttributeType's handling of collations for ranges.
Commit fc7695891 changed CheckAttributeType to recurse into ranges,
but made it pass down the wrong collation (always InvalidOid, since
ranges as such have no collation). This would result in guaranteed
failure when considering a range type whose subtype is collatable.
Embarrassingly, we lack any regression tests that would expose such
a problem (but fortunately, somebody noticed before we shipped this
bug in any release).
Fix it to pass down the range's subtype collation property instead,
and add some regression test cases to exercise collatable-subtype
ranges a bit more. Back-patch to all supported branches, as the
previous patch was.
Report and patch by Julien Rouhaud, test cases tweaked by me
Tom Lane [Fri, 31 Jan 2020 19:41:49 +0000 (14:41 -0500)]
Fix parallel pg_dump/pg_restore for failure to create worker processes.
If we failed to fork a worker process, or create a communication pipe
for one, WaitForTerminatingWorkers would suffer an assertion failure
if assert-enabled, otherwise crash or go into an infinite loop. This
was a consequence of not accounting for the startup condition where
we've not yet forked all the workers.
The original bug was that ParallelBackupStart would set workerStatus to
WRKR_IDLE before it had successfully forked a worker. I made things
worse in commit b7b8cc0cf by not understanding the undocumented fact
that the WRKR_TERMINATED state was also meant to represent the case
where a worker hadn't been started yet: I changed enum T_WorkerStatus
so that *all* the worker slots were initially in WRKR_IDLE state. But
this wasn't any more broken in practice, since even one slot in the
wrong state would keep WaitForTerminatingWorkers from terminating.
In v10 and later, introduce an explicit T_WorkerStatus value for
worker-not-started, in hopes of preventing future oversights of the
same ilk. Before that, just document that WRKR_TERMINATED is supposed
to cover that case (partly because it wasn't actively broken, and
partly because the enum is exposed outside parallel.c in those branches,
so there's microscopically more risk involved in changing it).
In all branches, introduce a WORKER_IS_RUNNING status test macro
to hide which T_WorkerStatus values mean that, and be more careful
not to access ParallelSlot fields till we're sure they're valid.
Per report from Vignesh C, though this is my patch not his.
Back-patch to all supported branches.
Tom Lane [Thu, 30 Jan 2020 23:25:55 +0000 (18:25 -0500)]
In jsonb_plpython.c, suppress warning message from gcc 10.
Very recent gcc complains that PLyObject_ToJsonbValue could return
a pointer to a local variable. I think it's wrong; but the coding
is fragile enough, and the savings of one palloc() minimal enough,
that it seems better to just do a palloc() all the time. (My other
idea of tweaking the if-condition doesn't suppress the warning.)
Thomas Munro [Thu, 30 Jan 2020 21:25:34 +0000 (10:25 +1300)]
Handle lack of DSM slots in parallel btree build.
If no DSM slots are available, a ParallelContext can still be
created, but its seg pointer is NULL. Teach parallel btree build
to cope with that by falling back to a regular non-parallel build,
to avoid crashing with a segmentation fault.
Back-patch to 11, where parallel CREATE INDEX landed.
Reported-by: Nicola Contu Reviewed-by: Peter Geoghegan
Discussion: https://postgr.es/m/CA%2BhUKGJgJEBnkuODBVomyK3MWFvDBbMVj%3Dgdt6DnRPU-5sQ6UQ%40mail.gmail.com
Fujii Masao [Thu, 30 Jan 2020 15:42:06 +0000 (00:42 +0900)]
Make inherited TRUNCATE perform access permission checks on parent table only.
Previously, TRUNCATE command through a parent table checked the
permissions on not only the parent table but also the children tables
inherited from it. This was a bug and inherited queries should perform
access permission checks on the parent table only. This commit fixes
that bug.
Back-patch to all supported branches.
Author: Amit Langote Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/CAHGQGwFHdSvifhJE+-GSNqUHSfbiKxaeQQ7HGcYz6SC2n_oDcg@mail.gmail.com
Michael Paquier [Thu, 30 Jan 2020 02:15:28 +0000 (11:15 +0900)]
Fix slot data persistency when advancing physical replication slots
Advancing a physical replication slot with pg_replication_slot_advance()
did not mark the slot as dirty if any advancing was done, preventing the
follow-up checkpoint to flush the slot data to disk. This caused the
advancing to be lost even on clean restarts. This does not happen for
logical slots as any advancing marked the slot as dirty. Per
discussion, the original feature has been implemented so as in the event
of a crash the slot may move backwards to a past LSN. This property is
kept and more documentation is added about that.
This commit adds some new TAP tests to check the persistency of physical
and logical slots after advancing across clean restarts.
Author: Alexey Kondratov, Michael Paquier Reviewed-by: Andres Freund, Kyotaro Horiguchi, Craig Ringer
Discussion: https://postgr.es/m/059cc53a-8b14-653a-a24d-5f867503b0ee@postgrespro.ru
Backpatch-through: 11
Tom Lane [Tue, 28 Jan 2020 22:26:37 +0000 (17:26 -0500)]
Fix dangling pointer in EvalPlanQual machinery.
EvalPlanQualStart() supposed that it could re-use the relsubs_rowmark
and relsubs_done arrays from a prior instantiation. But since they are
allocated in the es_query_cxt of the recheckestate, that's just wrong;
EvalPlanQualEnd() will blow away that storage. Therefore we were using
storage that could have been reallocated to something else, causing all
sorts of havoc.
I think this was modeled on the old code's handling of es_epqTupleSlot,
but since the code was anyway clearing the arrays at re-use, there's
clearly no expectation of importing any outside state. So it's just
a dubious savings of a couple of pallocs, which is negligible compared
to setting up a new planstate tree. Therefore, just allocate the
arrays always. (I moved the allocations slightly for readability.)
In principle this bug could cause a problem whenever EPQ rechecks are
needed in more than one target table of a ModifyTable plan node.
In practice it seems not quite so easy to trigger as that; I couldn't
readily duplicate a crash with a partitioned target table, for instance.
That's probably down to incidental choices about when to free or
reallocate stuff. The added isolation test case does seem to reliably
show an assertion failure, though.
Per report from Oleksii Kliukin. Back-patch to v12 where the bug was
introduced (evidently by commit 3fb307bc4).
Thomas Munro [Sun, 26 Jan 2020 23:52:08 +0000 (12:52 +1300)]
Avoid unnecessary shm writes in Parallel Hash Join.
Currently, Parallel Hash Join cannot be used for full/right joins,
so there is no point in setting the match flag. It turns out that
the cache coherence traffic generated by those writes slows down
large systems running many-core joins, so let's stop doing that.
In future, if we need to use match bits in parallel joins, we might
want to consider setting them only if not already set.
Back-patch to 11, where Parallel Hash Join arrived.
Tom Lane [Sun, 26 Jan 2020 21:31:48 +0000 (16:31 -0500)]
Fix EXPLAIN (SETTINGS) to follow policy about when to print empty fields.
In non-TEXT output formats, the "Settings" field should appear when
requested, even if it would be empty.
Also, get rid of the premature optimization of counting all the
GUC_EXPLAIN variables at startup. Since there was no provision for
adjusting that count later, all it'd take would be some extension marking
a parameter as GUC_EXPLAIN to risk an assertion failure or memory stomp.
We could make get_explain_guc_options() count those variables on-the-fly,
or dynamically resize its array ... but TBH I do not think that making a
transient array of pointers a bit smaller is worth any extra complication,
especially when you consider all the other transient space EXPLAIN eats.
So just allocate that array at the max possible size.
In HEAD, also add some regression test coverage for this feature.
Because of the memory-stomp hazard, back-patch to v12 where this
feature was added.
Tom Lane [Sun, 26 Jan 2020 19:31:08 +0000 (14:31 -0500)]
In postgres_fdw, don't try to ship MULTIEXPR updates to remote server.
In a statement like "UPDATE remote_tab SET (x,y) = (SELECT ...)",
we'd conclude that the statement could be directly executed remotely,
because the sub-SELECT is in a resjunk tlist item that's not examined
for shippability. Currently that ends up crashing if the sub-SELECT
contains any remote Vars. Prevent the crash by deeming MULTIEXEC
Params to be unshippable.
This is a bit of a brute-force solution, since if the sub-SELECT
*doesn't* contain any remote Vars, the current execution technology
would work; but that's not a terribly common use-case for this syntax,
I think. In any case, we generally don't try to ship sub-SELECTs, so
it won't surprise anybody that this doesn't end up as a remote direct
update. I'd be inclined to see if that general limitation can be fixed
before worrying about this case further.
Per report from Lukáš Sobotka.
Back-patch to 9.6. 9.5 had MULTIEXPR, but we didn't try to perform
remote direct updates then, so the case didn't arise anyway.
I had supposed that the from_char_seq_search() call sites were
all passing the constant arrays you'd expect them to pass ...
but on looking closer, the one for DY format was passing the
days[] array not days_short[]. This accidentally worked because
the day abbreviations in English are all the same as the first
three letters of the full day names. However, once we took out
the "maximum comparison length" logic, it stopped working.
As penance for that oversight, add regression test cases covering
this, as well as every other switch case in DCH_from_char() that
was not reached according to the code coverage report.
Also, fold the DCH_RM and DCH_rm cases into one --- now that
seq_search is case independent, there's no need to pass different
comparison arrays for those cases.
Tom Lane [Thu, 23 Jan 2020 18:42:10 +0000 (13:42 -0500)]
Clean up formatting.c's logic for matching constant strings.
seq_search(), which is used to match input substrings to constants
such as month and day names, had a lot of bizarre and unnecessary
behaviors. It was mostly possible to avert our eyes from that before,
but we don't want to duplicate those behaviors in the upcoming patch
to allow recognition of non-English month and day names. So it's time
to clean this up. In particular:
* seq_search scribbled on the input string, which is a pretty dangerous
thing to do, especially in the badly underdocumented way it was done here.
Fortunately the input string is a temporary copy, but that was being made
three subroutine levels away, making it something easy to break
accidentally. The behavior is externally visible nonetheless, in the form
of odd case-folding in error reports about unrecognized month/day names.
The scribbling is evidently being done to save a few calls to pg_tolower,
but that's such a cheap function (at least for ASCII data) that it's
pretty pointless to worry about. In HEAD I switched it to be
pg_ascii_tolower to ensure it is cheap in all cases; but there are corner
cases in Turkish where this'd change behavior, so leave it as pg_tolower
in the back branches.
* seq_search insisted on knowing the case form (all-upper, all-lower,
or initcap) of the constant strings, so that it didn't have to case-fold
them to perform case-insensitive comparisons. This likewise seems like
excessive micro-optimization, given that pg_tolower is certainly very
cheap for ASCII data. It seems unsafe to assume that we know the case
form that will come out of pg_locale.c for localized month/day names, so
it's better just to define the comparison rule as "downcase all strings
before comparing". (The choice between downcasing and upcasing is
arbitrary so far as English is concerned, but it might not be in other
locales, so follow citext's lead here.)
* seq_search also had a parameter that'd cause it to report a match
after a maximum number of characters, even if the constant string were
longer than that. This was not actually used because no caller passed
a value small enough to cut off a comparison. Replicating that behavior
for localized month/day names seems expensive as well as useless, so
let's get rid of that too.
* from_char_seq_search used the maximum-length parameter to truncate
the input string in error reports about not finding a matching name.
This leads to rather confusing reports in many cases. Worse, it is
outright dangerous if the input string isn't all-ASCII, because we
risk truncating the string in the middle of a multibyte character.
That'd lead either to delivering an illegible error message to the
client, or to encoding-conversion failures that obscure the actual
data problem. Get rid of that in favor of truncating at whitespace
if any (a suggestion due to Alvaro Herrera).
In addition to fixing these things, I const-ified the input string
pointers of DCH_from_char and its subroutines, to make sure there
aren't any other scribbling-on-input problems.
The risk of generating a badly-encoded error message seems like
enough of a bug to justify back-patching, so patch all supported
branches.
Michael Paquier [Wed, 22 Jan 2020 00:49:24 +0000 (09:49 +0900)]
Fix concurrent indexing operations with temporary tables
Attempting to use CREATE INDEX, DROP INDEX or REINDEX with CONCURRENTLY
on a temporary relation with ON COMMIT actions triggered unexpected
errors because those operations use multiple transactions internally to
complete their work. Here is for example one confusing error when using
ON COMMIT DELETE ROWS:
ERROR: index "foo" already contains data
Issues related to temporary relations and concurrent indexing are fixed
in this commit by enforcing the non-concurrent path to be taken for
temporary relations even if using CONCURRENTLY, transparently to the
user. Using a non-concurrent path does not matter in practice as locks
cannot be taken on a temporary relation by a session different than the
one owning the relation, and the non-concurrent operation is more
effective.
The problem exists with REINDEX since v12 with the introduction of
CONCURRENTLY, and with CREATE/DROP INDEX since CONCURRENTLY exists for
those commands. In all supported versions, this caused only confusing
error messages to be generated. Note that with REINDEX, it was also
possible to issue a REINDEX CONCURRENTLY for a temporary relation owned
by a different session, leading to a server crash.
The idea to enforce transparently the non-concurrent code path for
temporary relations comes originally from Andres Freund.
Reported-by: Manuel Rigger
Author: Michael Paquier, Heikki Linnakangas Reviewed-by: Andres Freund, Álvaro Herrera, Heikki Linnakangas
Discussion: https://postgr.es/m/CA+u7OA6gP7YAeCguyseusYcc=uR8+ypjCcgDDCTzjQ+k6S9ksQ@mail.gmail.com
Backpatch-through: 9.4
Andres Freund [Tue, 21 Jan 2020 07:26:51 +0000 (23:26 -0800)]
Fix edge case leading to agg transitions skipping ExecAggTransReparent() calls.
The code checking whether an aggregate transition value needs to be
reparented into the current context has always only compared the
transition return value with the previous transition value by datum,
i.e. without regard for NULLness. This normally works, because when
the transition function returns NULL (via fcinfo->isnull), it'll
return a value that won't be the same as its input value.
But there's no hard requirement that that's the case. And it turns
out, it's possible to hit this case (see discussion or reproducers),
leading to a non-null transition value not being reparented, followed
by a crash caused by that.
Instead of adding another comparison of NULLness, instead have
ExecAggTransReparent() ensure that pergroup->transValue ends up as 0
when the new transition value is NULL. That avoids having to add an
additional branch to the much more common cases of the transition
function returning the old transition value (which is a pointer in
this case), and when the new value is different, but not NULL.
In branches since 69c3936a149, also deduplicate the reparenting code
between the expression evaluation based transitions, and the path for
ordered aggregates.
Reported-By: Teodor Sigaev, Nikita Glukhov
Author: Andres Freund
Discussion: https://postgr.es/m/bd34e930-cfec-ea9b-3827-a8bc50891393@sigaev.ru
Backpatch: 9.4-, this issue has existed since at least 7.4