Alexander Korotkov [Thu, 13 Dec 2018 03:12:31 +0000 (06:12 +0300)]
Prevent GIN deleted pages from being reclaimed too early
When GIN vacuum deletes a posting tree page, it assumes that no concurrent
searchers can access it, thanks to ginStepRight() locking two pages at once.
However, since 9.4 searches can skip parts of posting trees descending from the
root. That leads to the risk that page is deleted and reclaimed before
concurrent search can access it.
This commit prevents the risk of above by waiting for every transaction, which
might wait to reference this page, to finish. Due to binary compatibility
we can't change GinPageOpaqueData to store corresponding transaction id.
Instead we reuse page header pd_prune_xid field, which is unused in index pages.
Discussion: https://postgr.es/m/
31a702a.14dd.
166c1366ac1.Coremail.chjischj%40163.com
Author: Andrey Borodin, Alexander Korotkov
Reviewed-by: Alexander Korotkov
Backpatch-through: 9.4
Alexander Korotkov [Thu, 13 Dec 2018 03:12:25 +0000 (06:12 +0300)]
Prevent deadlock in ginRedoDeletePage()
On standby ginRedoDeletePage() can work concurrently with read-only queries.
Those queries can traverse posting tree in two ways.
1) Using rightlinks by ginStepRight(), which locks the next page before
unlocking its left sibling.
2) Using downlinks by ginFindLeafPage(), which locks at most one page at time.
Original lock order was: page, parent, left sibling. That lock order can
deadlock with ginStepRight(). In order to prevent deadlock this commit changes
lock order to: left sibling, page, parent. Note, that position of parent in
locking order seems insignificant, because we only lock one page at time while
traversing downlinks.
Reported-by: Chen Huajun
Diagnosed-by: Chen Huajun, Peter Geoghegan, Andrey Borodin
Discussion: https://postgr.es/m/
31a702a.14dd.
166c1366ac1.Coremail.chjischj%40163.com
Author: Alexander Korotkov
Backpatch-through: 9.4
Alexander Korotkov [Thu, 13 Dec 2018 03:12:11 +0000 (06:12 +0300)]
Fix deadlock in GIN vacuum introduced by
218f51584d5
Before
218f51584d5 if posting tree page is about to be deleted, then the whole
posting tree is locked by LockBufferForCleanup() on root preventing all the
concurrent inserts.
218f51584d5 reduced locking to the subtree containing
page to be deleted. However, due to concurrent parent split, inserter doesn't
always holds pins on all the pages constituting path from root to the target
leaf page. That could cause a deadlock between GIN vacuum process and GIN
inserter. And we didn't find non-invasive way to fix this.
This commit reverts VACUUM behavior to lock the whole posting tree before
delete any page. However, we keep another useful change by
218f51584d5: the
tree is locked only if there are pages to be deleted.
Reported-by: Chen Huajun
Diagnosed-by: Chen Huajun, Andrey Borodin, Peter Geoghegan
Discussion: https://postgr.es/m/
31a702a.14dd.
166c1366ac1.Coremail.chjischj%40163.com
Author: Alexander Korotkov, based on ideas from Andrey Borodin and Peter Geoghegan
Reviewed-by: Andrey Borodin
Backpatch-through: 10
Tom Lane [Wed, 12 Dec 2018 21:08:30 +0000 (16:08 -0500)]
Repair bogus EPQ plans generated for postgres_fdw foreign joins.
postgres_fdw's postgresGetForeignPlan() assumes without checking that the
outer_plan it's given for a join relation must have a NestLoop, MergeJoin,
or HashJoin node at the top. That's been wrong at least since commit
4bbf6edfb (which could cause insertion of a Sort node on top) and it seems
like a pretty unsafe thing to Just Assume even without that.
Through blind good fortune, this doesn't seem to have any worse
consequences today than strange EXPLAIN output, but it's clearly trouble
waiting to happen.
To fix, test the node type explicitly before touching Join-specific
fields, and avoid jamming the new tlist into a node type that can't
do projection. Export a new support function from createplan.c
to avoid building low-level knowledge about the latter into FDWs.
Back-patch to 9.6 where the faulty coding was added. Note that the
associated regression test cases don't show any changes before v11,
apparently because the tests back-patched with
4bbf6edfb don't actually
exercise the problem case before then (there's no top-level Sort
in those plans).
Discussion: https://postgr.es/m/8946.
1544644803@sss.pgh.pa.us
Tom Lane [Wed, 12 Dec 2018 18:49:41 +0000 (13:49 -0500)]
Repair bogus handling of multi-assignment Params in upper plan levels.
Our support for multiple-set-clauses in UPDATE assumes that the Params
referencing a MULTIEXPR_SUBLINK SubPlan will appear before that SubPlan
in the targetlist of the plan node that calculates the updated row.
(Yeah, it's a hack...) In some PG branches it's possible that a Result
node gets inserted between the primary calculation of the update tlist
and the ModifyTable node. setrefs.c did the wrong thing in this case
and left the upper-level Params as Params, causing a crash at runtime.
What it should do is replace them with "outer" Vars referencing the child
plan node's output. That's a result of careless ordering of operations
in fix_upper_expr_mutator, so we can fix it just by reordering the code.
Fix fix_join_expr_mutator similarly for consistency, even though join
nodes could never appear in such a context. (In general, it seems
likely to be a bit cheaper to use Vars than Params in such situations
anyway, so this patch might offer a tiny performance improvement.)
The hazard extends back to 9.5 where the MULTIEXPR_SUBLINK stuff
was introduced, so back-patch that far. However, this may be a live
bug only in 9.6.x and 10.x, as the other branches don't seem to want
to calculate the final tlist below the Result node. (That plan shape
change between branches might be a mini-bug in itself, but I'm not
really interested in digging into the reasons for that right now.
Still, add a regression test memorializing what we expect there,
so we'll notice if it changes again.)
Per bug report from Eduards Bezverhijs.
Discussion: https://postgr.es/m/
b6cd572a-3e44-8785-75e9-
c512a5a17a73@tieto.com
Tom Lane [Tue, 11 Dec 2018 16:48:00 +0000 (11:48 -0500)]
Fix test_rls_hooks to assign expression collations properly.
This module overlooked this necessary fixup step on the results of
transformWhereClause(). It accidentally worked anyway, because the
constructed expression involved type "name" which is not collatable,
but it fell over while I was experimenting with changing "name" to
be collatable.
Back-patch, not because there's any live bug here in back branches,
but because somebody might use this code as a model for some real
application and then not understand why it doesn't work.
Tom Lane [Tue, 11 Dec 2018 16:21:36 +0000 (11:21 -0500)]
Doc: improve documentation about ALTER LARGE OBJECT requirements.
Unlike other ALTER ref pages, this one neglected to mention that
ALTER OWNER requires being a member of the new owning role.
Per bug #15546 from Stefan Kadow.
Discussion: https://postgr.es/m/15546-
0558c75fd2025e7c@postgresql.org
Noah Misch [Tue, 11 Dec 2018 04:15:42 +0000 (20:15 -0800)]
Raise some timeouts to 180s, in test code.
Slow runs of buildfarm members chipmunk, hornet and mandrill saw the
shorter timeouts expire. The 180s timeout in poll_query_until has been
trouble-free since
2a0f89cd717ce6d49cdc47850577823682167e87 introduced
it two years ago, so use 180s more widely. Back-patch to 9.6, where the
first of these timeouts was introduced.
Reviewed by Michael Paquier.
Discussion: https://postgr.es/m/
20181209001601[email protected]
Tom Lane [Mon, 10 Dec 2018 16:12:43 +0000 (11:12 -0500)]
Add stack depth checks to key recursive functions in backend/nodes/*.c.
Although copyfuncs.c has a check_stack_depth call in its recursion,
equalfuncs.c, outfuncs.c, and readfuncs.c lacked one. This seems
unwise.
Likewise fix planstate_tree_walker(), in branches where that exists.
Discussion: https://postgr.es/m/30253.
1544286631@sss.pgh.pa.us
Tom Lane [Mon, 10 Dec 2018 15:38:49 +0000 (10:38 -0500)]
Make TupleDescInitBuiltinEntry throw error for unsupported types.
Previously, it would just pass back a partially-uninitialized tupdesc,
which doesn't seem like a safe or useful behavior.
Backpatch to v10 where this code came in.
Discussion: https://postgr.es/m/30830.
1544384975@sss.pgh.pa.us
Tom Lane [Fri, 7 Dec 2018 17:12:00 +0000 (12:12 -0500)]
Fix misapplication of pgstat_count_truncate to wrong relation.
The stanza of ExecuteTruncate[Guts] that truncates a target table's toast
relation re-used the loop local variable "rel" to reference the toast rel.
This was safe enough when written, but commit
d42358efb added code below
that that supposed "rel" still pointed to the parent table. Therefore,
the stats counter update was applied to the wrong relcache entry (the
toast rel not the user rel); and if we were unlucky and that relcache
entry had been flushed during reindex_relation, very bad things could
ensue.
(I'm surprised that CLOBBER_CACHE_ALWAYS testing hasn't found this.
I'm even more surprised that the problem wasn't detected during the
development of
d42358efb; it must not have been tested in any case
with a toast table, as the incorrect stats counts are very obvious.)
To fix, replace use of "rel" in that code branch with a more local
variable. Adjust test cases added by
d42358efb so that some of them
use tables with toast tables.
Per bug #15540 from Pan Bian. Back-patch to 9.5 where
d42358efb came in.
Discussion: https://postgr.es/m/15540-
01078812338195c0@postgresql.org
Tom Lane [Fri, 7 Dec 2018 16:02:39 +0000 (11:02 -0500)]
Clean up sloppy coding in publicationcmds.c's OpenTableList().
Remove dead code (which would be incorrect if it weren't dead),
per report from Pan Bian. Add a CHECK_FOR_INTERRUPTS in the
inner loop over child relations, because there's little point
in having one in the outer loop if there's not one here too.
Minor stylistic adjustments and comment improvements.
Seems to be aboriginal to this code (cf commit
665d1fad9).
Back-patch to v10 where that came in, not because any of this
is significant, but just to keep the branches looking similar.
Discussion: https://postgr.es/m/15539-
06d00ef6b1e2e1bb@postgresql.org
Tom Lane [Thu, 6 Dec 2018 20:08:44 +0000 (15:08 -0500)]
Improve our response to invalid format strings, and detect more cases.
Places that are testing for *printf failure ought to include the format
string in their error reports, since bad-format-string is one of the
more likely causes of such failure. This both makes it easier to find
and repair the mistake, and provides at least some useful info to the
user who stumbles across such a problem.
Also, tighten snprintf.c to report EINVAL for an invalid flag or
final character in a format %-spec (including the case where the
%-spec is missing a final character altogether). This seems like
better project policy, and it also allows removing an instruction
or two from the hot code path.
Back-patch the error reporting change in pvsnprintf, since it should be
harmless and may be helpful; but not the snprintf.c change.
Per discussion of bug #15511 from Ertuğrul Kahveci, which reported an
invalid translated format string. These changes don't fix that error,
but they should improve matters next time we make such a mistake.
Discussion: https://postgr.es/m/15511-
1d8b6a0bc874112f@postgresql.org
Stephen Frost [Thu, 6 Dec 2018 16:38:56 +0000 (11:38 -0500)]
Improve planner stats documentation
It was pointed out that in the planner stats documentation under
Extended Statistics, one of the sentences was a bit awkward. Improve
that by rewording it slightly.
Discussion: https://postgr.es/m/
154409976780.14137.
2785644488950047100@wrigleys.postgresql.org
Michael Paquier [Wed, 5 Dec 2018 01:03:01 +0000 (10:03 +0900)]
Fix invalid value of synchronous_commit in description of flush_lag
"remote_flush" has never been a valid user-facing value, but "on" is.
Author: Maksim Milyutin
Discussion: https://postgr.es/m/
27b3b80c-3615-2d76-02c5-
44566b53136c@gmail.com
Tom Lane [Thu, 29 Nov 2018 23:28:10 +0000 (18:28 -0500)]
Document handling of invalid/ambiguous timestamp input near DST boundaries.
The source code comments documented this, but the user-facing docs, not
so much. Add a section to Appendix B that discusses it.
In passing, improve a couple other things in Appendix B --- notably,
a long-obsolete claim that time zone abbreviations are looked up in
a fixed table.
Per bug #15527 from Michael Davidson.
Discussion: https://postgr.es/m/15527-
f1be0b4dc99ebbe7@postgresql.org
Tom Lane [Thu, 29 Nov 2018 20:53:44 +0000 (15:53 -0500)]
Ensure static libraries have correct mod time even if ranlib messes it up.
In at least Apple's version of ranlib, the output file is updated to have
a mod time equal to the max of the timestamps of its components, and that
data only has seconds precision. On a filesystem with sub-second file
timestamp precision --- say, APFS --- this can result in the finished
static library appearing older than its input files, which causes useless
rebuilds and possible outright failures in parallel makes.
We've only seen this reported in the field from people using Apple's
ranlib with a non-Apple make, because Apple's make doesn't know about
sub-second timestamps either so it doesn't decide rebuilds are needed.
But Apple's ranlib presumably shares code with at least some BSDen,
so it's not that unlikely that the same problem could arise elsewhere.
To fix, just "touch" the output file after ranlib finishes.
We seem to need this in only one place. There are other calls of
ranlib in our makefiles, but they are working on intermediate files
whose timestamps are not actually important, or else on an installed
static library for which sub-second timestamp precision is unlikely
to matter either. (Also, so far as I can tell, Apple's ranlib doesn't
mess up the file timestamp in the latter usage anyhow.)
In passing, change "ranlib" to "$(RANLIB)" in one place that was
bypassing the make macro for no good reason.
Per bug #15525 from Jack Kelly (via Alyssa Ross).
Back-patch to all supported branches.
Discussion: https://postgr.es/m/15525-
a30da084f17a1faa@postgresql.org
Thomas Munro [Thu, 29 Nov 2018 01:14:26 +0000 (14:14 +1300)]
Fix minor typo in dsa.c.
Author: Takeshi Ideriha
Discussion: https://postgr.es/m/
4E72940DA2BF16479384A86D54D0988A6F3BF22D%40G01JPEXMBKW04
Michael Paquier [Thu, 29 Nov 2018 00:12:45 +0000 (09:12 +0900)]
Fix handling of synchronous replication for stopping WAL senders
This fixes an oversight from
c6c3334 which forgot that if a subset of
WAL senders are stopping and in a sync state, other WAL senders could
still be waiting for a WAL position to be synced while committing a
transaction. However the subset of stopping senders would not release
waiters, potentially breaking synchronous replication guarantees. This
commit makes sure that even WAL senders stopping are able to release
waiters and are tracked properly.
On 9.4, this can also trigger an assertion failure when setting for
example max_wal_senders to 1 where a WAL sender is not able to find
itself as in synchronous state when the instance stops.
Reported-by: Paul Guo
Author: Paul Guo, Michael Paquier
Discussion: https://postgr.es/m/CAEET0ZEv8VFqT3C-cQm6byOB4r4VYWcef1J21dOX-gcVhCSpmA@mail.gmail.com
Backpatch-through: 9.4
Bruce Momjian [Wed, 28 Nov 2018 12:34:10 +0000 (07:34 -0500)]
C comment: remove extra '*'
Reported-by: Etsuro Fujita
Discussion: https://postgr.es/m/
5BFE34DE.
1080404@lab.ntt.co.jp
Author: Etsuro Fujita
Backpatch-through: 10
Thomas Munro [Wed, 28 Nov 2018 01:00:57 +0000 (14:00 +1300)]
Don't set PAM_RHOST for Unix sockets.
Since commit
2f1d2b7a we have set PAM_RHOST to "[local]" for Unix
sockets. This caused Linux PAM's libaudit integration to make DNS
requests for that name. It's not exactly clear what value PAM_RHOST
should have in that case, but it seems clear that we shouldn't set it
to an unresolvable name, so don't do that.
Back-patch to 9.6. Bug #15520.
Author: Thomas Munro
Reviewed-by: Peter Eisentraut
Reported-by: Albert Schabhuetl
Discussion: https://postgr.es/m/15520-
4c266f986998e1c5%40postgresql.org
Tomas Vondra [Wed, 28 Nov 2018 00:11:15 +0000 (01:11 +0100)]
Do not decode TOAST data for table rewrites
During table rewrites (VACUUM FULL and CLUSTER), the main heap is logged
using XLOG / FPI records, and thus (correctly) ignored in decoding.
But the associated TOAST table is WAL-logged as plain INSERT records,
and so was logically decoded and passed to reorder buffer.
That has severe consequences with TOAST tables of non-trivial size.
Firstly, reorder buffer has to keep all those changes, possibly spilling
them to a file, incurring I/O costs and disk space.
Secondly, ReoderBufferCommit() was stashing all those TOAST chunks into
a hash table, which got discarded only after processing the row from the
main heap. But as the main heap is not decoded for rewrites, this never
happened, so all the TOAST data accumulated in memory, resulting either
in excessive memory consumption or OOM.
The fix is simple, as commit
e9edc1ba already introduced infrastructure
(namely HEAP_INSERT_NO_LOGICAL flag) to skip logical decoding of TOAST
tables, but it only applied it to system tables. So simply use it for
all TOAST data in raw_heap_insert().
That would however solve only the memory consumption issue - the TOAST
changes would still be decoded and added to the reorder buffer, and
spilled to disk (although without TOAST tuple data, so much smaller).
But we can solve that by tweaking DecodeInsert() to just ignore such
INSERT records altogether, using XLH_INSERT_CONTAINS_NEW_TUPLE flag,
instead of skipping them later in ReorderBufferCommit().
Review: Masahiko Sawada
Discussion: https://www.postgresql.org/message-id/flat/
1a17c643-e9af-3dba-486b-
fbe31bc1823a%402ndquadrant.com
Backpatch: 9.4-, where logical decoding was introduced
Andres Freund [Tue, 27 Nov 2018 07:26:05 +0000 (23:26 -0800)]
Fix
ac218aa4f6 to work on versions before 9.5.
Unfortunately
ac218aa4f6 missed the fact that a reference to
'pg_catalog.regnamespace'::regclass wouldn't work before that type is
known. Fix that, by replacing the regtype usage with a join to
pg_type.
Reported-By: Tom Lane
Author: Andres Freund
Discussion: https://postgr.es/m/8863.
1543297423@sss.pgh.pa.us
Backpatch: 9.5-, like
ac218aa4f6
Andres Freund [Tue, 27 Nov 2018 01:00:43 +0000 (17:00 -0800)]
Update pg_upgrade test for reg* to include regrole and regnamespace.
When the regrole (
0c90f6769) and regnamespace (
cb9fa802b) types were
added in 9.5, pg_upgrade's check for reg* types wasn't updated. While
regrole currently is safe, regnamespace is not.
It seems unlikely that anybody uses regnamespace inside catalog tables
across a pg_upgrade, but the tests should be correct nevertheless.
While at it, reorder the types checked in the query to be
alphabetical. Otherwise it's annoying to compare existing and tested
for types.
Author: Andres Freund
Discussion: https://postgr.es/m/
037e152a-cb25-3bcb-4f35-
bdc9988f8204@2ndQuadrant.com
Backpatch: 9.5-, as regrole/regnamespace
Bruce Momjian [Tue, 27 Nov 2018 00:41:19 +0000 (19:41 -0500)]
doc: fix wording for plpgsql, add "and"
Reported-by: Anthony Greene
Discussion: https://postgr.es/m/CAPRNmnsSZ4QL75FUjcS8ND_oV+WjgyPbZ4ch2RUwmW6PWzF38w@mail.gmail.com
Backpatch-through: 9.4
Tom Lane [Mon, 26 Nov 2018 22:32:51 +0000 (17:32 -0500)]
Fix translation of special characters in psql's LaTeX output modes.
latex_escaped_print() mistranslated \ and failed to provide any translation
for # ^ and ~, all of which would typically lead to LaTeX document syntax
errors. In addition it didn't translate < > and |, which would typically
render as unexpected characters.
To some extent this represents shortcomings in ancient versions of LaTeX,
which if memory serves had no easy way to render these control characters
as ASCII text. But that's been fixed for, um, decades. In any case there
is no value in emitting guaranteed-to-fail output for these characters.
Noted while fooling with test cases added by commit
9a98984f4. Back-patch
the code change to all supported versions.
Alvaro Herrera [Mon, 26 Nov 2018 19:58:02 +0000 (16:58 -0300)]
Fix sample output for hash_metapage_info query
One output column was duplicated. Couldn't resist fixing the version
number while at it.
Reported-by: Gianni Ciolli
Michael Paquier [Mon, 26 Nov 2018 07:43:19 +0000 (16:43 +0900)]
Revert "Fix typo in documentation of toast storage"
This reverts commit
058ef3a, per complains from Magnus Hagander and Vik
Fearing.
Michael Paquier [Mon, 26 Nov 2018 06:49:23 +0000 (15:49 +0900)]
Fix typo in documentation of toast storage
Author: Nawaz Ahmed
Discussion: https://postgr.es/m/
154319327168.1315.
1846953598601966513@wrigleys.postgresql.org
Andrew Gierth [Sat, 24 Nov 2018 09:59:49 +0000 (09:59 +0000)]
Fix hstore hash function for empty hstores upgraded from 8.4.
Hstore data generated on pg 8.4 and pg_upgraded to current versions
remains in its original on-disk format unless modified. The same goes
for values generated by the addon hstore-new module on pre-9.0
versions. (The hstoreUpgrade function converts old values on the fly
when read in, but the on-disk value is not modified by this.)
Since old-format empty hstores (and hstore-new hstores) have
representations compatible with the new format, hstoreUpgrade thought
it could get away without modifying such values; but this breaks
hstore_hash (and the new hstore_hash_extended) which assumes
bit-perfect matching between semantically identical hstore values.
Only one bit actually differs (the "new version" flag in the count
field) but that of course is enough to break the hash.
Fix by making hstoreUpgrade unconditionally convert all old values to
new format.
Backpatch all the way, even though this changes a hash value in some
cases, because in those cases the hash value is already failing - for
example, a hash join between old- and new-format empty hstores will be
failing to match, or a hash index on an hstore column containing an
old-format empty value will be failing to find the value since it will
be searching for a hash derived from a new-format datum. (There are no
known field reports of this happening, probably because hashing of
hstores has only been useful in limited circumstances and there
probably isn't much upgraded data being used this way.)
Per concerns arising from discussion of commit
eb6f29141be. Original
bug is my fault.
Discussion: https://postgr.es/m/
60b1fd3b-7332-40f0-7e7f-
f2f04f777747%402ndquadrant.com
Tom Lane [Sat, 24 Nov 2018 18:53:12 +0000 (13:53 -0500)]
Update additional float4/8 expected-output files.
I forgot that the back branches have more variant files than HEAD :-(.
Per buildfarm.
Discussion: https://postgr.es/m/15519-
4fc785b483201ff1@postgresql.org
Tom Lane [Sat, 24 Nov 2018 17:45:49 +0000 (12:45 -0500)]
Fix float-to-integer coercions to handle edge cases correctly.
ftoi4 and its sibling coercion functions did their overflow checks in
a way that looked superficially plausible, but actually depended on an
assumption that the MIN and MAX comparison constants can be represented
exactly in the float4 or float8 domain. That fails in ftoi4, ftoi8,
and dtoi8, resulting in a possibility that values near the MAX limit will
be wrongly converted (to negative values) when they need to be rejected.
Also, because we compared before rounding off the fractional part,
the other three functions threw errors for values that really ought
to get rounded to the min or max integer value.
Fix by doing rint() first (requiring an assumption that it handles
NaN and Inf correctly; but dtoi8 and ftoi8 were assuming that already),
and by comparing to values that should coerce to float exactly, namely
INTxx_MIN and -INTxx_MIN. Also remove some random cosmetic discrepancies
between these six functions.
This back-patches commits
cbdb8b4c0 and
452b637d4. In the 9.4 branch,
also back-patch the portion of
62e2a8dc2 that added PG_INTnn_MIN and
related constants to c.h, so that these functions can rely on them.
Per bug #15519 from Victor Petrovykh.
Patch by me; thanks to Andrew Gierth for analysis and discussion.
Discussion: https://postgr.es/m/15519-
4fc785b483201ff1@postgresql.org
Andrew Gierth [Fri, 23 Nov 2018 23:56:39 +0000 (23:56 +0000)]
Avoid crashes in contrib/intarray gist__int_ops (bug #15518)
1. Integer overflow in internal_size could result in memory corruption
in decompression since a zero-length array would be allocated and then
written to. This leads to crashes or corruption when traversing an
index which has been populated with sufficiently sparse values. Fix by
using int64 for computations and checking for overflow.
2. Integer overflow in g_int_compress could cause pessimal merge
choices, resulting in unnecessarily large ranges (which would in turn
trigger issue 1 above). Fix by using int64 again.
3. Even without overflow, array sizes could become large enough to
cause unexplained memory allocation errors. Fix by capping the sizes
to a safe limit and report actual errors pointing at gist__intbig_ops
as needed.
4. Large inputs to the compression function always consist of large
runs of consecutive integers, and the compression loop was processing
these one at a time in an O(N^2) manner with a lot of overhead. The
expected runtime of this function could easily exceed 6 months for a
single call as a result. Fix by performing a linear-time first pass,
which reduces the worst case to something on the order of seconds.
Backpatch all the way, since this has been wrong forever.
Per bug #15518 from report from irc user "dymk", analysis and patch by
me.
Discussion: https://postgr.es/m/15518-
799e426c3b4f8358@postgresql.org
Peter Eisentraut [Fri, 23 Nov 2018 10:41:27 +0000 (11:41 +0100)]
doc: Fix typo
Bruce Momjian [Wed, 21 Nov 2018 22:20:15 +0000 (17:20 -0500)]
doc: adjust time zone names text, v2
Removed one too many words. Fix for
7906de847f229f391b9e6b5892b4b4a89f29edb4.
Reported-by: Thomas Munro
Backpatch-through: 9.4
Bruce Momjian [Wed, 21 Nov 2018 21:55:40 +0000 (16:55 -0500)]
doc: adjust time zone names text
Reported-by: Kevin
Discussion: https://postgr.es/m/
154082462281.30897.
14043119084654378035@wrigleys.postgresql.org
Backpatch-through: 9.4
Peter Eisentraut [Tue, 13 Nov 2018 09:42:43 +0000 (10:42 +0100)]
doc: Clarify CREATE TYPE ENUM documentation
The documentation claimed that an enum type requires "one or more"
labels, but since
1fd9883ff49, zero labels are also allowed.
Reported-by: Lukas Eder
Bug: #15356
Tom Lane [Mon, 19 Nov 2018 22:28:05 +0000 (17:28 -0500)]
Add needed #include.
Per POSIX, WIFSIGNALED and related macros are provided by
.
Apparently on Linux they're also pulled in by some other inclusions,
but BSD-ish systems are pickier. Fixes portability issue in ffa4cbd62.
Per buildfarm.
Tom Lane [Mon, 19 Nov 2018 22:02:25 +0000 (17:02 -0500)]
Handle EPIPE more sanely when we close a pipe reading from a program.
Previously, any program launched by COPY TO/FROM PROGRAM inherited the
server's setting of SIGPIPE handling, i.e. SIG_IGN. Hence, if we were
doing COPY FROM PROGRAM and closed the pipe early, the child process
would see EPIPE on its output file and typically would treat that as
a fatal error, in turn causing the COPY to report error. Similarly,
one could get a failure report from a query that didn't read all of
the output from a contrib/file_fdw foreign table that uses file_fdw's
PROGRAM option.
To fix, ensure that child programs inherit SIG_DFL not SIG_IGN processing
of SIGPIPE. This seems like an all-around better situation since if
the called program wants some non-default treatment of SIGPIPE, it would
expect to have to set that up for itself. Then in COPY, if it's COPY
FROM PROGRAM and we stop reading short of detecting EOF, treat a SIGPIPE
exit from the called program as a non-error condition. This still allows
us to report an error for any case where the called program gets SIGPIPE
on some other file descriptor.
As coded, we won't report a SIGPIPE if we stop reading as a result of
seeing an in-band EOF marker (e.g. COPY BINARY EOF marker). It's
somewhat debatable whether we should complain if the called program
continues to transmit data after an EOF marker. However, it seems like
we should avoid throwing error in any questionable cases, especially in a
back-patched fix, and anyway it would take additional code to make such
an error get reported consistently.
Back-patch to v10. We could go further back, since COPY FROM PROGRAM
has been around awhile, but AFAICS the only way to reach this situation
using core or contrib is via file_fdw, which has only supported PROGRAM
sources since v10. The COPY statement per se has no feature whereby
it'd stop reading without having hit EOF or an error already. Therefore,
I don't see any upside to back-patching further that'd outweigh the
risk of complaints about behavioral change.
Per bug #15449 from Eric Cyr.
Patch by me, review by Etsuro Fujita and Kyotaro Horiguchi
Discussion: https://postgr.es/m/15449-
1cf737dd5929450e@postgresql.org
Tom Lane [Mon, 19 Nov 2018 17:01:47 +0000 (12:01 -0500)]
Fix configure's AC_CHECK_DECLS tests to work correctly with clang.
The test case that Autoconf uses to discover whether a function has
been declared doesn't work reliably with clang, because clang reports
a warning not an error if the name is a known built-in function.
On some platforms, this results in a lot of compile-time warnings about
strlcpy and related functions not having been declared.
There is a fix for this (by Noah Misch) in the upstream Autoconf sources,
but since they've not made a release in years and show no indication of
doing so anytime soon, let's just absorb their fix directly. We can
revert this when and if we update to a newer Autoconf release.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/26819.
1542515567@sss.pgh.pa.us
Alvaro Herrera [Mon, 19 Nov 2018 14:16:28 +0000 (11:16 -0300)]
Disallow COPY FREEZE on partitioned tables
This didn't actually work: COPY would fail to flush the right files, and
instead would try to flush a non-existing file, causing the whole
transaction to fail.
Cope by raising an error as soon as the command is sent instead, to
avoid a nasty later surprise. Of course, it would be much better to
make it work, but we don't have a patch for that yet, and we don't know
if we'll want to backpatch one when we do.
Reported-by: Tomas Vondra
Author: David Rowley
Reviewed-by: Amit Langote, Steve Singer, Tomas Vondra
Thomas Munro [Mon, 19 Nov 2018 00:40:57 +0000 (13:40 +1300)]
PANIC on fsync() failure.
On some operating systems, it doesn't make sense to retry fsync(),
because dirty data cached by the kernel may have been dropped on
write-back failure. In that case the only remaining copy of the
data is in the WAL. A subsequent fsync() could appear to succeed,
but not have flushed the data. That means that a future checkpoint
could apparently complete successfully but have lost data.
Therefore, violently prevent any future checkpoint attempts by
panicking on the first fsync() failure. Note that we already
did the same for WAL data; this change extends that behavior to
non-temporary data files.
Provide a GUC data_sync_retry to control this new behavior, for
users of operating systems that don't eject dirty data, and possibly
forensic/testing uses. If it is set to on and the write-back error
was transient, a later checkpoint might genuinely succeed (on a
system that does not throw away buffers on failure); if the error is
permanent, later checkpoints will continue to fail. The GUC defaults
to off, meaning that we panic.
Back-patch to all supported releases.
There is still a narrow window for error-loss on some operating
systems: if the file is closed and later reopened and a write-back
error occurs in the intervening time, but the inode has the bad
luck to be evicted due to memory pressure before we reopen, we could
miss the error. A later patch will address that with a scheme
for keeping files with dirty data open at all times, but we judge
that to be too complicated to back-patch.
Author: Craig Ringer, with some adjustments by Thomas Munro
Reported-by: Craig Ringer
Reviewed-by: Robert Haas, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/
20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de
Thomas Munro [Mon, 19 Nov 2018 00:40:50 +0000 (13:40 +1300)]
Don't forget about failed fsync() requests.
If fsync() fails, md.c must keep the request in its bitmap, so that
future attempts will try again.
Back-patch to all supported releases.
Author: Thomas Munro
Reviewed-by: Amit Kapila
Reported-by: Andrew Gierth
Discussion: https://postgr.es/m/87y3i1ia4w.fsf%40news-spur.riddles.org.uk
Tomas Vondra [Sat, 17 Nov 2018 22:50:21 +0000 (23:50 +0100)]
Add valgrind suppressions for wcsrtombs optimizations
wcsrtombs (called through wchar2char from common functions like lower,
upper, etc.) uses various optimizations that may look like access to
uninitialized data, triggering valgrind reports.
For example AVX2 instructions load data in 256-bit chunks, and gconv
does something similar with 32-bit chunks. This is faster than accessing
the bytes one by one, and the uninitialized part of the buffer is not
actually used. So suppress the bogus reports.
The exact stack depends on possible optimizations - it might be AVX, SSE
(as in the report by Aleksander Alekseev) or something else. Hence the
last frame is wildcarded, to deal with this.
Backpatch all the way back to 9.4.
Author: Tomas Vondra
Discussion: https://www.postgresql.org/message-id/flat/
90ac0452-e907-e7a4-b3c8-
15bd33780e62%402ndquadrant.com
Discussion: https://www.postgresql.org/message-id/
20180220150838[email protected]
Tom Lane [Wed, 14 Nov 2018 21:29:57 +0000 (16:29 -0500)]
Doc: remove claim that all \pset format options are unique in 1 letter.
This hasn't been correct since 9.3 added "latex-longtable".
I left the phraseology "Unique abbreviations are allowed" alone.
It's correct as far as it goes, and we are studiously refraining
from specifying exactly what happens if you give a non-unique
abbreviation. (The answer in the back branches is "you get a
backwards-compatible choice", and the answer in HEAD will shortly
be "you get an error", but there seems no need to mention such
details here.)
Daniel Vérité
Discussion: https://postgr.es/m/
cb7e1caf-3ea6-450d-af28-
f524903a030c@manitou-mail.org
Tom Lane [Wed, 14 Nov 2018 16:27:30 +0000 (11:27 -0500)]
Second try at fixing numeric data passed through an ECPG SQLDA.
In commit
ecfd55795, I removed sqlda.c's checks for ndigits != 0 on the
grounds that we should duplicate the state of the numeric value's digit
buffer even when all the digits are zeroes. However, that still isn't
quite right, because another possible state of the digit buffer is
buf == digits == NULL (this occurs for a NaN). As the code now stands,
it'll invoke memcpy with a NULL source address and zero bytecount,
which we know a few platforms crash on. Hence, reinstate the no-copy
short-circuit, but make it test specifically for buf != NULL rather than
some other condition. In hindsight, the ndigits test (added by commit
f2ae9f9c3) was almost certainly meant to fix the NaN case not the
all-zeroes case as the associated thread alleged.
As before, back-patch to all supported versions.
Discussion: https://postgr.es/m/
1803D792815FC24D871C00D17AE95905C71161@g01jpexmbkw24
Michael Paquier [Wed, 14 Nov 2018 07:48:10 +0000 (16:48 +0900)]
Initialize TransactionState and user ID consistently at transaction start
If a failure happens when a transaction is starting between the moment
the transaction status is changed from TRANS_DEFAULT to TRANS_START and
the moment the current user ID and security context flags are fetched
via GetUserIdAndSecContext(), or before initializing its basic fields,
then those may get reset to incorrect values when the transaction
aborts, leaving the session in an inconsistent state.
One problem reported is that failing a starting transaction at the first
query of a session could cause several kinds of system crashes on the
follow-up queries.
In order to solve that, move the initialization of the transaction state
fields and the call of GetUserIdAndSecContext() in charge of fetching
the current user ID close to the point where the transaction status is
switched to TRANS_START, where there cannot be any error triggered
in-between, per an idea of Tom Lane. This properly ensures that the
current user ID, the security context flags and that the basic fields of
TransactionState remain consistent even if the transaction fails while
starting.
Reported-by: Richard Guo
Diagnosed-By: Richard Guo
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAN_9JTxECSb=pEPcb0a8d+6J+bDcOZ4=DgRo_B7Y5gRHJUM=Rw@mail.gmail.com
Backpatch-through: 9.4
Tom Lane [Tue, 13 Nov 2018 20:46:08 +0000 (15:46 -0500)]
Fix incorrect results for numeric data passed through an ECPG SQLDA.
Numeric values with leading zeroes were incorrectly copied into a
SQLDA (SQL Descriptor Area), leading to wrong results in ECPG programs.
Report and patch by Daisuke Higuchi. Back-patch to all supported
versions.
Discussion: https://postgr.es/m/
1803D792815FC24D871C00D17AE95905C71161@g01jpexmbkw24
Amit Kapila [Tue, 13 Nov 2018 05:06:59 +0000 (10:36 +0530)]
Fix the initialization of atomic variable introduced by the
group clearing mechanism.
Commit
0e141c0fbb introduced initialization of atomic variable in
InitProcess which means that it's not safe to look at it for backends that
aren't currently in use. Fix that by initializing the same during postmaster
startup.
Reported-by: Andres Freund
Author: Amit Kapila
Backpatch-through: 9.6
Discussion:https://postgr.es/m/
20181027104138[email protected]
Thomas Munro [Tue, 13 Nov 2018 03:27:13 +0000 (16:27 +1300)]
Fix possible buffer overrun in hba.c.
Coverty reports a possible buffer overrun in the code that populates the
pg_hba_file_rules view. It may not be a live bug due to restrictions
on options that can be used together, but let's increase MAX_HBA_OPTIONS
and correct a nearby misleading comment.
Back-patch to 10 where this code arrived.
Reported-by: Julian Hsiao
Discussion: https://postgr.es/m/CADnGQpzbkWdKS2YHNifwAvX5VEsJ5gW49U4o-7UL5pzyTv4vTg%40mail.gmail.com
Tom Lane [Mon, 12 Nov 2018 16:19:04 +0000 (11:19 -0500)]
Limit the number of index clauses considered in choose_bitmap_and().
classify_index_clause_usage() is O(N^2) in the number of distinct index
qual clauses it considers, because of its use of a simple search list to
store them. For nearly all queries, that's fine because only a few clauses
will be considered. But Alexander Kuzmenkov reported a machine-generated
query with 80000 (!) index qual clauses, which caused this code to take
forever. Somewhat remarkably, this is the only O(N^2) behavior we now
have for such a query, so let's fix it.
We can get rid of the O(N^2) runtime for cases like this without much
damage to the functionality of choose_bitmap_and() by separating out
paths with "too many" qual or pred clauses, and deeming them to always
be nonredundant with other paths. Then their clauses needn't go into
the search list, so it doesn't get too long, but we don't lose the
ability to consider bitmap AND plans altogether. I set the threshold
for "too many" to be 100 clauses per path, which should be plenty to
ensure no change in planning behavior for normal queries.
There are other things we could do to make this go faster, but it's not
clear that it's worth any additional effort. 80000 qual clauses require
a whole lot of work in many other places, too.
The code's been like this for a long time, so back-patch to all supported
branches. The troublesome query only works back to 9.5 (in 9.4 it fails
with stack overflow in the parser); so I'm not sure that fixing this in
9.4 has any real-world benefit, but perhaps it does.
Discussion: https://postgr.es/m/
90c5bdfa-d633-dabe-9889-
3cf3e1acd443@postgrespro.ru
Michael Paquier [Mon, 12 Nov 2018 14:00:55 +0000 (23:00 +0900)]
Fix incorrect author name in release notes
Author: Alexander Lakhin
Discussion: https://postgr.es/m/
2f55f6d2-3fb0-d4f6-5c47-
18da3a1117e0@gmail.com
Tom Lane [Sat, 10 Nov 2018 01:42:03 +0000 (20:42 -0500)]
Fix missing role dependencies for some schema and type ACLs.
This patch fixes several related cases in which pg_shdepend entries were
never made, or were lost, for references to roles appearing in the ACLs of
schemas and/or types. While that did no immediate harm, if a referenced
role were later dropped, the drop would be allowed and would leave a
dangling reference in the object's ACL. That still wasn't a big problem
for normal database usage, but it would cause obscure failures in
subsequent dump/reload or pg_upgrade attempts, taking the form of
attempts to grant privileges to all-numeric role names. (I think I've
seen field reports matching that symptom, but can't find any right now.)
Several cases are fixed here:
1. ALTER DOMAIN SET/DROP DEFAULT would lose the dependencies for any
existing ACL entries for the domain. This case is ancient, dating
back as far as we've had pg_shdepend tracking at all.
2. If a default type privilege applies, CREATE TYPE recorded the
ACL properly but forgot to install dependency entries for it.
This dates to the addition of default privileges for types in 9.2.
3. If a default schema privilege applies, CREATE SCHEMA recorded the
ACL properly but forgot to install dependency entries for it.
This dates to the addition of default privileges for schemas in v10
(commit
ab89e465c).
Another somewhat-related problem is that when creating a relation
rowtype or implicit array type, TypeCreate would apply any available
default type privileges to that type, which we don't really want
since such an object isn't supposed to have privileges of its own.
(You can't, for example, drop such privileges once they've been added
to an array type.)
ab89e465c is also to blame for a race condition in the regression tests:
privileges.sql transiently installed globally-applicable default
privileges on schemas, which sometimes got absorbed into the ACLs of
schemas created by concurrent test scripts. This should have resulted
in failures when privileges.sql tried to drop the role holding such
privileges; but thanks to the bug fixed here, it instead led to dangling
ACLs in the final state of the regression database. We'd managed not to
notice that, but it became obvious in the wake of commit
da906766c, which
allowed the race condition to occur in pg_upgrade tests.
To fix, add a function recordDependencyOnNewAcl to encapsulate what
callers of get_user_default_acl need to do; while the original call
sites got that right via ad-hoc code, none of the later-added ones
have. Also change GenerateTypeDependencies to generate these
dependencies, which requires adding the typacl to its parameter list.
(That might be annoying if there are any extensions calling that
function directly; but if there are, they're most likely buggy in the
same way as the core callers were, so they need work anyway.) While
I was at it, I changed GenerateTypeDependencies to accept most of its
parameters in the form of a Form_pg_type pointer, making its parameter
list a bit less unwieldy and mistake-prone.
The test race condition is fixed just by wrapping the addition and
removal of default privileges into a single transaction, so that that
state is never visible externally. We might eventually prefer to
separate out tests of default privileges into a script that runs by
itself, but that would be a bigger change and would make the tests
run slower overall.
Back-patch relevant parts to all supported branches.
Discussion: https://postgr.es/m/15719.
1541725287@sss.pgh.pa.us
Michael Paquier [Fri, 9 Nov 2018 01:03:39 +0000 (10:03 +0900)]
Fix dependency handling of partitions and inheritance for ON COMMIT
This commit fixes a set of issues with ON COMMIT actions when used on
partitioned tables and tables with inheritance children:
- Applying ON COMMIT DROP on a partitioned table with partitions or on a
table with inheritance children caused a failure at commit time, with
complains about the children being already dropped as all relations are
dropped one at the same time.
- Applying ON COMMIT DELETE on a partition relying on a partitioned
table which uses ON COMMIT DROP would cause the partition truncation to
fail as the parent is removed first.
The solution to the first problem is to handle the removal of all the
dependencies in one go instead of dropping relations one-by-one, based
on a suggestion from Álvaro Herrera. So instead all the relation OIDs
to remove are gathered and then processed in one round of multiple
deletions.
The solution to the second problem is to reorder the actions, with
truncation happening first and relation drop done after. Even if it
means that a partition could be first truncated, then immediately
dropped if its partitioned table is dropped, this has the merit to keep
the code simple as there is no need to do existence checks on the
relations to drop.
Contrary to a manual TRUNCATE on a partitioned table, ON COMMIT DELETE
does not cascade to its partitions. The ON COMMIT action defined on
each partition gets the priority.
Author: Michael Paquier
Reviewed-by: Amit Langote, Álvaro Herrera, Robert Haas
Discussion: https://postgr.es/m/
68f17907-ec98-1192-f99f-
8011400517f5@lab.ntt.co.jp
Backpatch-through: 10
Tom Lane [Thu, 8 Nov 2018 22:33:26 +0000 (17:33 -0500)]
Disallow setting client_min_messages higher than ERROR.
Previously it was possible to set client_min_messages to FATAL or PANIC,
which had the effect of suppressing transmission of regular ERROR messages
to the client. Perhaps that seemed like a useful option in the past, but
the trouble with it is that it breaks guarantees that are explicitly made
in our FE/BE protocol spec about how a query cycle can end. While libpq
and psql manage to cope with the omission, that's mostly because they
are not very bright; client libraries that have more semantic knowledge
are likely to get confused. Notably, pgODBC doesn't behave very sanely.
Let's fix this by getting rid of the ability to set client_min_messages
above ERROR.
In HEAD, just remove the FATAL and PANIC options from the set of allowed
enum values for client_min_messages. (This change also affects
trace_recovery_messages, but that's OK since these aren't useful values
for that variable either.)
In the back branches, there was concern that rejecting these values might
break applications that are explicitly setting things that way. I'm
pretty skeptical of that argument, but accommodate it by accepting these
values and then internally setting the variable to ERROR anyway.
In all branches, this allows a couple of tiny simplifications in the
logic in elog.c, so do that.
Also respond to the point that was made that client_min_messages has
exactly nothing to do with the server's logging behavior, and therefore
does not belong in the "When To Log" subsection of the documentation.
The "Statement Behavior" subsection is a better match, so move it there.
Jonah Harris and Tom Lane
Discussion: https://postgr.es/m/7809.
1541521180@sss.pgh.pa.us
Discussion: https://postgr.es/m/15479-
ef0f4cc2fd995ca2@postgresql.org
Alvaro Herrera [Thu, 8 Nov 2018 19:22:09 +0000 (16:22 -0300)]
Revise attribute handling code on partition creation
The original code to propagate NOT NULL and default expressions
specified when creating a partition was mostly copy-pasted from
typed-tables creation, but not being a great match it contained some
duplicity, inefficiency and bugs.
This commit fixes the bug that NOT NULL constraints declared in the
parent table would not be honored in the partition. One reported issue
that is not fixed is that a DEFAULT declared in the child is not used
when inserting through the parent. That would amount to a behavioral
change that's better not back-patched.
This rewrite makes the code simpler:
1. instead of checking for duplicate column names in its own block,
reuse the original one that already did that;
2. instead of concatenating the list of columns from parent and the one
declared in the partition and scanning the result to (incorrectly)
propagate defaults and not-null constraints, just scan the latter
searching the former for a match, and merging sensibly. This works
because we know the list in the parent is already correct and there can
only be one parent.
This rewrite makes ColumnDef->is_from_parent unused, so it's removed
on branch master; on released branches, it's kept as an unused field in
order not to cause ABI incompatibilities.
This commit also adds a test case for creating partitions with
collations mismatching that on the parent table, something that is
closely related to the code being patched. No code change is introduced
though, since that'd be a behavior change that could break some (broken)
working applications.
Amit Langote wrote a less invasive fix for the original
NOT NULL/defaults bug, but while I kept the tests he added, I ended up
not using his original code. Ashutosh Bapat reviewed Amit's fix. Amit
reviewed mine.
Author: Álvaro Herrera, Amit Langote
Reviewed-by: Ashutosh Bapat, Amit Langote
Reported-by: Jürgen Strobel (bug #15212)
Discussion: https://postgr.es/m/
152746742177.1291.
9847032632907407358@wrigleys.postgresql.org
Bruce Momjian [Tue, 6 Nov 2018 18:40:02 +0000 (13:40 -0500)]
GUC: adjust effective_cache_size SQL descriptions
Follow on patch for commit
3e0f1a4741f564c1a2fa6e944729d6967355d8c7.
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/
369ec766-b947-51bd-4dad-
6fb9e026439f@2ndquadrant.com
Backpatch-through: 9.4
Tom Lane [Tue, 6 Nov 2018 18:25:24 +0000 (13:25 -0500)]
Rename rbtree.c functions to use "rbt" prefix not "rb" prefix.
The "rb" prefix is used by Ruby, so that our existing code results
in name collisions that break plruby. We discussed ways to prevent
that by adjusting dynamic linker options, but it seems that at best
we'd move the pain to other cases. Renaming to avoid the collision
is the only portable fix anyway. Fortunately, our rbtree code is
not (yet?) widely used --- in core, there's only a single usage
in GIN --- so it seems likely that we can get away with a rename.
I chose to do this basically as s/rb/rbt/g, except for places where
there already was a "t" after "rb". The patch could have been made
smaller by only touching linker-visible symbols, but it would have
resulted in oddly inconsistent-looking code. Better to make it look
like "rbt" was the plan all along.
Back-patch to v10. The rbtree.c code exists back to 9.5, but
rb_iterate() which is the actual immediate source of pain was added
in v10, so it seems like changing the names before that would have
more risk than benefit.
Per report from Pavel Raiskup.
Discussion: https://postgr.es/m/
4738198[email protected]
Tom Lane [Mon, 5 Nov 2018 21:45:50 +0000 (16:45 -0500)]
Stamp 10.6.
Tom Lane [Mon, 5 Nov 2018 21:07:06 +0000 (16:07 -0500)]
Last-minute updates for release notes.
I removed the item about the pg_stat_statements change from
release-11.sgml, as part of a sweep to delete items already committed
in 11.0; but actually we'd best keep it to ensure that people who've
pg_upgraded their databases will take the requisite action. Also make
said action more visible by making it into its own para. Noted by
Jonathan Katz.
Andres Freund [Mon, 5 Nov 2018 20:02:25 +0000 (12:02 -0800)]
Fix copy-paste error in errhint() introduced in
691d79a07933.
Reported-By: Petr Jelinek
Discussion: https://postgr.es/m/
c95a620b-34f0-7930-aeb5-
f7ab804f26cb@2ndquadrant.com
Backpatch: 9.4-, like the previous commit
Tom Lane [Mon, 5 Nov 2018 15:48:23 +0000 (10:48 -0500)]
Last-minute updates for release notes.
Security: CVE-2018-16850
Peter Eisentraut [Mon, 5 Nov 2018 13:46:40 +0000 (14:46 +0100)]
Translation updates
Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git
Source-Git-Hash:
4aac9391521d21fdecc378db4750a59795350b33
Michael Paquier [Mon, 5 Nov 2018 02:04:20 +0000 (11:04 +0900)]
Block creation of partitions with open references to its parent
When a partition is created as part of a trigger processing, it is
possible that the partition which just gets created changes the
properties of the table the executor of the ongoing command relies on,
causing a subsequent crash. This has been found possible when for
example using a BEFORE INSERT which creates a new partition for a
partitioned table being inserted to.
Any attempt to do so is blocked when working on a partition, with
regression tests added for both CREATE TABLE PARTITION OF and ALTER
TABLE ATTACH PARTITION.
Reported-by: Dmitry Shalashov
Author: Amit Langote
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/15437-
3fe01ee66bd1bae1@postgresql.org
Backpatch-through: 10
Michael Paquier [Mon, 5 Nov 2018 00:15:25 +0000 (09:15 +0900)]
Ignore partitioned tables when processing ON COMMIT DELETE ROWS
Those tables have no physical storage, making this option unusable with
partition trees as at commit time an actual truncation was attempted.
There are still issues with the way ON COMMIT actions are done when
mixing several action types, however this impacts as well inheritance
trees, so this issue will be dealt with later.
Reported-by: Rajkumar Raghuwanshi
Author: Amit Langote
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/CAKcux6mhgcjSiB_egqEAEFgX462QZtncU8QCAJ2HZwM-wWGVew@mail.gmail.com
Tom Lane [Sun, 4 Nov 2018 21:57:14 +0000 (16:57 -0500)]
Release notes for 11.1, 10.6, 9.6.11, 9.5.15, 9.4.20, 9.3.25.
Tom Lane [Sat, 3 Nov 2018 17:56:10 +0000 (13:56 -0400)]
Make ts_locale.c's character-type functions cope with UTF-16.
On Windows, in UTF8 database encoding, what char2wchar() produces is
UTF16 not UTF32, ie, characters above U+FFFF will be represented by
surrogate pairs. t_isdigit() and siblings did not account for this
and failed to provide a large enough result buffer. That in turn
led to bogus "invalid multibyte character for locale" errors, because
contrary to what you might think from char2wchar()'s documentation,
its Windows code path doesn't cope sanely with buffer overflow.
The solution for t_isdigit() and siblings is pretty clear: provide
a 3-wchar_t result buffer not 2.
char2wchar() also needs some work to provide more consistent, and more
accurately documented, buffer overrun behavior. But that's a bigger job
and it doesn't actually have any immediate payoff, so leave it for later.
Per bug #15476 from Kenji Uno, who deserves credit for identifying the
cause of the problem. Back-patch to all active branches.
Discussion: https://postgr.es/m/15476-
4314f480acf0f114@postgresql.org
Stephen Frost [Sat, 3 Nov 2018 16:22:09 +0000 (12:22 -0400)]
Remove extra word from create sub docs
Improve the documentation in the CREATE SUBSCRIPTION command a bit by
removing an extraneous word and spelling out 'information'.
Tom Lane [Fri, 2 Nov 2018 22:54:00 +0000 (18:54 -0400)]
Yet further rethinking of build changes for macOS Mojave.
The solution arrived at in commit
e74dd00f5 presumes that the compiler
has a suitable default -isysroot setting ... but further experience
shows that in many combinations of macOS version, XCode version, Xcode
command line tools version, and phase of the moon, Apple's compiler
will *not* supply a default -isysroot value.
We could potentially go back to the approach used in commit
68fc227dd,
but I don't have a lot of faith in the reliability or life expectancy of
that either. Let's just revert to the approach already shipped in 11.0,
namely specifying an -isysroot switch globally. As a partial response to
the concerns raised by Jakob Egger, adjust the contents of Makefile.global
to look like
CPPFLAGS = -isysroot $(PG_SYSROOT) ...
PG_SYSROOT = /path/to/sysroot
This allows overriding the sysroot path at build time in a relatively
painless way.
Add documentation to installation.sgml about how to use the PG_SYSROOT
option. I also took the opportunity to document how to work around
macOS's "System Integrity Protection" feature.
As before, back-patch to all supported versions.
Discussion: https://postgr.es/m/20840.
1537850987@sss.pgh.pa.us
Bruce Momjian [Fri, 2 Nov 2018 17:05:30 +0000 (13:05 -0400)]
docs: adjust simpler language for NULL return from ANY/ALL
Adjustment to commit
8610c973ddf1cbf0befc1369d2cf0d56c0efcd0a.
Reported-by: Tom Lane
Discussion: https://postgr.es/m/17406.
1541168421@sss.pgh.pa.us
Backpatch-through: 9.3
Bruce Momjian [Fri, 2 Nov 2018 13:11:00 +0000 (09:11 -0400)]
GUC: adjust effective_cache_size docs and SQL description
Clarify that effective_cache_size is both kernel buffers and shared
buffers.
Reported-by: [email protected]
Discussion: https://postgr.es/m/
153685164808.22334.
15432535018443165207@wrigleys.postgresql.org
Backpatch-through: 9.3
Magnus Hagander [Fri, 2 Nov 2018 12:55:57 +0000 (13:55 +0100)]
Fix some spelling errors in the documentation
Author: Daniel Gustafsson
Bruce Momjian [Fri, 2 Nov 2018 12:54:34 +0000 (08:54 -0400)]
doc: use simpler language for NULL return from ANY/ALL
Previously the combination of "does not return" and "any row" caused
ambiguity.
Reported-by: KES
Discussion: https://postgr.es/m/
153701242703.22334.
1476830122267077397@wrigleys.postgresql.org
Reviewed-by: David G. Johnston
Backpatch-through: 9.3
Andres Freund [Thu, 1 Nov 2018 17:44:29 +0000 (10:44 -0700)]
Fix error message typo introduced
691d79a07933.
Reported-By: Michael Paquier
Discussion: https://postgr.es/m/
20181101003405[email protected]
Backpatch: 9.4-, like the previous commit
Andres Freund [Wed, 31 Oct 2018 21:47:41 +0000 (14:47 -0700)]
Disallow starting server with insufficient wal_level for existing slot.
Previously it was possible to create a slot, change wal_level, and
restart, even if the new wal_level was insufficient for the
slot. That's a problem for both logical and physical slots, because
the necessary WAL records are not generated.
This removes a few tests in newer versions that, somewhat
inexplicably, whether restarting with a too low wal_level worked (a
buggy behaviour!).
Reported-By: Joshua D. Drake
Author: Andres Freund
Discussion: https://postgr.es/m/
20181029191304[email protected]
Backpatch: 9.4-, where replication slots where introduced
Tom Lane [Wed, 31 Oct 2018 21:04:43 +0000 (17:04 -0400)]
Fix memory leak in repeated SPGIST index scans.
spgendscan neglected to pfree all the memory allocated by spgbeginscan.
It's possible to get away with that in most normal queries, since the
memory is allocated in the executor's per-query context which is about
to get deleted anyway; but it causes severe memory leakage during
creation or filling of large exclusion-constraint indexes.
Also, document that amendscan is supposed to free what ambeginscan
allocates. The docs' lack of clarity on that point probably caused this
bug to begin with. (There is discussion of changing that API spec going
forward, but I don't think it'd be appropriate for the back branches.)
Per report from Bruno Wolff. It's been like this since the beginning,
so back-patch to all active branches.
In HEAD, also fix an independent leak caused by commit
2a6368343
(allocating memory during spgrescan instead of spgbeginscan, which
might be all right if it got cleaned up, but it didn't). And do a bit
of code beautification on that commit, too.
Discussion: https://postgr.es/m/
20181024012314[email protected]
Tom Lane [Wed, 31 Oct 2018 13:47:53 +0000 (09:47 -0400)]
Sync our copy of the timezone library with IANA release tzcode2018g.
This patch absorbs an upstream fix to "zic" for a recently-introduced
bug that made it output data that some 32-bit clients couldn't read.
Given the current source data, the bug only manifests in zones with
leap seconds, which we don't generate, so that there's no actual
change in our installed timezone data files from this. Still, in
case somebody uses our copy of "zic" to do something else, it seems
best to apply the fix promptly.
Also, update the README's notes about converting upstream code to
our conventions.
Tom Lane [Wed, 31 Oct 2018 12:35:50 +0000 (08:35 -0400)]
Update time zone data files to tzdata release 2018g.
DST law changes in Morocco (with, effectively, zero notice).
Historical corrections for Hawaii.
Magnus Hagander [Mon, 29 Oct 2018 11:34:49 +0000 (12:34 +0100)]
Fix missing whitespace in pg_dump ref page
Author: Daniel Gustafsson
Peter Eisentraut [Mon, 29 Oct 2018 10:31:43 +0000 (11:31 +0100)]
pg_restore: Augment documentation for -N option
This was forgotten when the option was added.
Author: Michael Banck
Andrew Dunstan [Sun, 28 Oct 2018 16:22:32 +0000 (12:22 -0400)]
Fix perl searchpath for modern perl for MSVC tools
Modern versions of perl no longer include the current directory in the
perl searchpath, as it's insecure. Instead of adding the current
directory, we get around the problem by adding the directory where the
script lives.
Problem noted by Victor Wagner.
Solution adapted from buildfarm client code.
Backpatch to all live versions.
Michael Paquier [Wed, 24 Oct 2018 08:02:51 +0000 (17:02 +0900)]
List wait events in alphabetical order in documentation
Keeping all those entries in order helps the user looking at the
documentation in finding them.
Author: Michael Paquier, Kuntal Ghosh
Discussion: https://postgr.es/m/
20181024002539[email protected]
Alexander Korotkov [Sun, 21 Oct 2018 21:23:26 +0000 (00:23 +0300)]
Fix some grammar errors in bloom.sgml
Discussion: https://postgr.es/m/CAEepm%3D3sijpGr8tXdyz-7EJJZfhQHABPKEQ29gpnb7-XSy%2B%3D5A%40mail.gmail.com
Reported-by: Thomas Munro
Backpatch-through: 9.6
Andrew Dunstan [Sat, 20 Oct 2018 13:02:36 +0000 (09:02 -0400)]
Lower privilege level of programs calling regression_main
On Windows this mean that the regression tests can now safely and
successfully run as Administrator, which is useful in situations like
Appveyor. Elsewhere it's a no-op.
Backpatch to 9.5 - this is harder in earlier branches and not worth the
trouble.
Discussion: https://postgr.es/m/
650b0c29-9578-8571-b1d2-
550d7f89f307@2ndQuadrant.com
Tom Lane [Sat, 20 Oct 2018 02:22:57 +0000 (22:22 -0400)]
Client-side fixes for delayed NOTIFY receipt.
PQnotifies() is defined to just process already-read data, not try to read
any more from the socket. (This is a debatable decision, perhaps, but I'm
hesitant to change longstanding library behavior.) The documentation has
long recommended calling PQconsumeInput() before PQnotifies() to ensure
that any already-arrived message would get absorbed and processed.
However, psql did not get that memo, which explains why it's not very
reliable about reporting notifications promptly.
Also, most (not quite all) callers called PQconsumeInput() just once before
a PQnotifies() loop. Taking this recommendation seriously implies that we
should do PQconsumeInput() before each call. This is more important now
that we have "payload" strings in notification messages than it was before;
that increases the probability of having more than one packet's worth
of notify messages. Hence, adjust code as well as documentation examples
to do it like that.
Back-patch to 9.5 to match related server fixes. In principle we could
probably go back further with these changes, but given lack of field
complaints I doubt it's worthwhile.
Discussion: https://postgr.es/m/CAOYf6ec-TmRYjKBXLLaGaB-jrd=mjG1Hzn1a1wufUAR39PQYhw@mail.gmail.com
Tom Lane [Sat, 20 Oct 2018 01:39:21 +0000 (21:39 -0400)]
Server-side fix for delayed NOTIFY and SIGTERM processing.
Commit
4f85fde8e introduced some code that was meant to ensure that we'd
process cancel, die, sinval catchup, and notify interrupts while waiting
for client input. But there was a flaw: it supposed that the process
latch would be set upon arrival at secure_read() if any such interrupt
was pending. In reality, we might well have cleared the process latch
at some earlier point while those flags remained set -- particularly
notifyInterruptPending, which can't be handled as long as we're within
a transaction.
To fix the NOTIFY case, also attempt to process signals (except
ProcDiePending) before trying to read.
Also, if we see that ProcDiePending is set before we read, forcibly set the
process latch to ensure that we will handle that signal promptly if no data
is available. I also made it set the process latch on the way out, in case
there is similar logic elsewhere. (It remains true that we won't service
ProcDiePending here unless we need to wait for input.)
The code for handling ProcDiePending during a write needs those changes,
too.
Also be a little more careful about when to reset whereToSendOutput,
and improve related comments.
Back-patch to 9.5 where this code was added. I'm not entirely convinced
that older branches don't have similar issues, but the complaint at hand
is just about the >= 9.5 code.
Jeff Janes and Tom Lane
Discussion: https://postgr.es/m/CAOYf6ec-TmRYjKBXLLaGaB-jrd=mjG1Hzn1a1wufUAR39PQYhw@mail.gmail.com
Tom Lane [Fri, 19 Oct 2018 23:36:34 +0000 (19:36 -0400)]
Sync our copy of the timezone library with IANA release tzcode2018f.
About half of this is purely cosmetic changes to reduce the diff between
our code and theirs, like inserting "const" markers where they have them.
The other half is tracking actual code changes in zic.c and localtime.c.
I don't think any of these represent near-term compatibility hazards, but
it seems best to stay up to date.
I also fixed longstanding bugs in our code for producing the
known_abbrevs.txt list, which by chance hadn't been exposed before,
but which resulted in some garbage output after applying the upstream
changes in zic.c. Notably, because upstream removed their old phony
transitions at the Big Bang, it's now necessary to cope with TZif files
containing no DST transition times at all.
Tom Lane [Fri, 19 Oct 2018 21:01:34 +0000 (17:01 -0400)]
Update time zone data files to tzdata release 2018f.
DST law changes in Chile, Fiji, and Russia (Volgograd).
Historical corrections for China, Japan, Macau, and North Korea.
Note: like the previous tzdata update, this involves a depressingly
large amount of semantically-meaningless churn in tzdata.zi. That
is a consequence of upstream's data compression method assigning
unstable abbreviations to DST rulesets. I complained about that
to them last time, and this version now uses an assignment method
that pays some heed to not changing abbreviations unnecessarily.
So hopefully, that'll be better going forward.
Tom Lane [Fri, 19 Oct 2018 04:50:17 +0000 (00:50 -0400)]
Add missing quote_identifier calls for CREATE TRIGGER ... REFERENCING.
Mixed-case names for transition tables weren't dumped correctly.
Oversight in commit
8c48375e5, per bug #15440 from Karl Czajkowski.
In passing, I couldn't resist a bit of code beautification.
Back-patch to v10 where this was introduced.
Discussion: https://postgr.es/m/15440-
02d1468e94d63d76@postgresql.org
Tom Lane [Thu, 18 Oct 2018 18:55:23 +0000 (14:55 -0400)]
Still further rethinking of build changes for macOS Mojave.
To avoid the sorts of problems complained of by Jakob Egger, it'd be
best if configure didn't emit any references to the sysroot path at all.
In the case of PL/Tcl, we can do that just by keeping our hands off the
TCL_INCLUDE_SPEC string altogether. In the case of PL/Perl, we need to
substitute -iwithsysroot for -I in the compile commands, which is easily
handled if we change to using a configure output variable that includes
the switch not only the directory name. Since PL/Tcl and PL/Python
already do it like that, this seems like good consistency cleanup anyway.
Hence, this replaces the advice given to Perl-related extensions in commit
5e2217131; instead of writing "-I$(perl_archlibexp)/CORE", they should
just write "$(perl_includespec)". (The old way continues to work, but not
on recent macOS.)
It's still the case that configure needs to be aware of the sysroot
path internally, but that's cleaner than what we had before.
As before, back-patch to all supported versions.
Discussion: https://postgr.es/m/20840.
1537850987@sss.pgh.pa.us
Tom Lane [Wed, 17 Oct 2018 19:06:38 +0000 (15:06 -0400)]
Fix minor bug in isolationtester.
If the lock wait query failed, isolationtester would report the
PQerrorMessage from some other connection, meaning there would be
no message or an unrelated one. This seems like a pretty unlikely
occurrence, but if it did happen, this bug could make it really
difficult/confusing to figure out what happened. That seems to
justify patching all the way back.
In passing, clean up another place where the "wrong" conn was used
for an error report. That one's not actually buggy because it's
a different alias for the same connection, but it's still confusing
to the reader.
Tom Lane [Wed, 17 Oct 2018 16:26:48 +0000 (12:26 -0400)]
Improve tzparse's handling of TZDEFRULES ("posixrules") zone data.
In the IANA timezone code, tzparse() always tries to load the zone
file named by TZDEFRULES ("posixrules"). Previously, we'd hacked
that logic to skip the load in the "lastditch" code path, which we use
only to initialize the default "GMT" zone during GUC initialization.
That's critical for a couple of reasons: since we do not support leap
seconds, we *must not* allow "GMT" to have leap seconds, and since this
case runs before the GUC subsystem is fully alive, we'd really rather
not take the risk of pg_open_tzfile throwing any errors.
However, that still left the code reading TZDEFRULES on every other
call, something we'd noticed to the extent of having added code to cache
the result so it was only done once per process not a lot of times.
Andres Freund complained about the static data space used up for the
cache; but as long as the logic was like this, there was no point in
trying to get rid of that space.
We can improve matters by looking a bit more closely at what the IANA
code actually needs the TZDEFRULES data for. One thing it does is
that if "posixrules" is a leap-second-aware zone, the leap-second
behavior will be absorbed into every POSIX-style zone specification.
However, that's a behavior we'd really prefer to do without, since
for our purposes the end effect is to render every POSIX-style zone
name unsupported. Otherwise, the TZDEFRULES data is used only if
the POSIX zone name specifies DST but doesn't include a transition
date rule (e.g., "EST5EDT" rather than "EST5EDT,M3.2.0,M11.1.0").
That is a minority case for our purposes --- in particular, it
never happens when tzload() invokes tzparse() to interpret a
transition date rule string found in a tzdata zone file.
Hence, if we legislate that we're going to ignore leap-second data
from "posixrules", we can postpone the TZDEFRULES load into the path
where we actually need to substitute for a missing date rule string.
That means it will never happen at all in common scenarios, making it
reasonable to dynamically allocate the cache space when it does happen.
Even when the data is already loaded, this saves some cycles in the
common code path since we avoid a memcpy of 23KB or so. And, IMO at
least, this is a less ugly hack on the IANA logic than what we had
before, since it's not messing with the lastditch-vs-regular code paths.
Back-patch to all supported branches, not so much because this is a
critical change as that I want to keep all our copies of the IANA
timezone code in sync.
Discussion: https://postgr.es/m/
20181015200754[email protected]
Tom Lane [Tue, 16 Oct 2018 20:27:15 +0000 (16:27 -0400)]
Back off using -isysroot on Darwin.
Rethink the solution applied in commit
5e2217131 to get PL/Tcl to
build on macOS Mojave. I feared that adding -isysroot globally might
have undesirable consequences, and sure enough Jakob Egger reported
one: it complicates building extensions with a different Xcode version
than was used for the core server. (I find that a risky proposition
in general, but apparently it works most of the time, so we shouldn't
break it if we don't have to.)
We'd already adopted the solution for PL/Perl of inserting the sysroot
path directly into the -I switches used to find Perl's headers, and we
can do the same thing for PL/Tcl by changing the -iwithsysroot switch
that Apple's tclConfig.sh reports. This restricts the risks to PL/Perl
and PL/Tcl themselves and directly-dependent extensions, which is a lot
more pleasing in general than a global -isysroot switch.
Along the way, tighten the test to see if we need to inject the sysroot
path into $perl_includedir, as I'd speculated about upthread but not
gotten round to doing.
As before, back-patch to all supported versions.
Discussion: https://postgr.es/m/20840.
1537850987@sss.pgh.pa.us
Tom Lane [Tue, 16 Oct 2018 17:56:58 +0000 (13:56 -0400)]
Avoid rare race condition in privileges.sql regression test.
We created a temp table, then switched to a new session, leaving
the old session to clean up its temp objects in background. If that
took long enough, the eventual attempt to drop the user that owns
the temp table could fail, as exhibited today by sidewinder.
Fix by dropping the temp table explicitly when we're done with it.
It's been like this for quite some time, so back-patch to all
supported branches.
Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2018-10-16%2014%3A45%3A00
Tom Lane [Tue, 16 Oct 2018 16:27:14 +0000 (12:27 -0400)]
Make PostgresNode.pm's poll_query_until() more chatty about failures.
Reporting only the stderr is unhelpful when the problem is that the
server output we're getting doesn't match what was expected. So we
should report the query output too; and just for good measure, let's
print the query we used and the output we expected.
Back-patch to 9.5 where poll_query_until was introduced.
Discussion: https://postgr.es/m/17913.
1539634756@sss.pgh.pa.us
Tom Lane [Tue, 16 Oct 2018 16:01:19 +0000 (12:01 -0400)]
Improve stability of recently-added regression test case.
Commit
b5febc1d1 added a contrib/btree_gist test case that has been
observed to fail in the buildfarm as a result of background auto-analyze
updating stats and changing the selected plan. Forestall that by
forcibly analyzing in foreground, instead. The new plan choice is
just as good for our purposes, since we really only care that an
index-only plan does not get selected.
Back-patch to 9.5, like the previous patch.
Discussion: https://postgr.es/m/14643.
1539629304@sss.pgh.pa.us
Tom Lane [Tue, 16 Oct 2018 15:50:18 +0000 (11:50 -0400)]
Avoid statically allocating gmtsub()'s timezone workspace.
localtime.c's "struct state" is a rather large object, ~23KB. We were
statically allocating one for gmtsub() to use to represent the GMT
timezone, even though that function is not at all heavily used and is
never reached in most backends. Let's malloc it on-demand, instead.
This does pose the question of how to handle a malloc failure, but
there's already a well-defined error report convention here, ie
set errno and return NULL.
We have but one caller of pg_gmtime in HEAD, and two in back branches,
neither of which were troubling to check for error. Make them do so.
The possible errors are sufficiently unlikely (out-of-range timestamp,
and now malloc failure) that I think elog() is adequate.
Back-patch to all supported branches to keep our copies of the IANA
timezone code in sync. This particular change is in a stanza that
already differs from upstream, so it's a wash for maintenance purposes
--- but only as long as we keep the branches the same.
Discussion: https://postgr.es/m/
20181015200754[email protected]
Tom Lane [Mon, 15 Oct 2018 18:01:38 +0000 (14:01 -0400)]
Check for stack overrun in standard_ProcessUtility().
ProcessUtility can recurse, and indeed can be driven to infinite
recursion, so it ought to have a check_stack_depth() call. This
covers the reported bug (portal trying to execute itself) and a bunch
of other cases that could perhaps arise somewhere.
Per bug #15428 from Malthe Borch. Back-patch to all supported branches.
Discussion: https://postgr.es/m/15428-
b3c2915ec470b033@postgresql.org
Alexander Korotkov [Sun, 14 Oct 2018 21:40:17 +0000 (00:40 +0300)]
contrib/bloom documentation improvement
This commit documents rounding of "length" parameter and absence of support
for unique indexes and NULLs searching. Backpatch to 9.6 where contrib/bloom
was introduced.
Discussion: https://postgr.es/m/CAF4Au4wPQQ7EHVSnzcLjsbY3oLSzVk6UemZLD1Sbmwysy3R61g%40mail.gmail.com
Author: Oleg Bartunov with minor editorialization by me
Backpatch-through: 9.6
Michael Paquier [Sun, 14 Oct 2018 13:23:35 +0000 (22:23 +0900)]
Avoid duplicate XIDs at recovery when building initial snapshot
On a primary, sets of XLOG_RUNNING_XACTS records are generated on a
periodic basis to allow recovery to build the initial state of
transactions for a hot standby. The set of transaction IDs is created
by scanning all the entries in ProcArray. However it happens that its
logic never counted on the fact that two-phase transactions finishing to
prepare can put ProcArray in a state where there are two entries with
the same transaction ID, one for the initial transaction which gets
cleared when prepare finishes, and a second, dummy, entry to track that
the transaction is still running after prepare finishes. This way
ensures a continuous presence of the transaction so as callers of for
example TransactionIdIsInProgress() are always able to see it as alive.
So, if a XLOG_RUNNING_XACTS takes a standby snapshot while a two-phase
transaction finishes to prepare, the record can finish with duplicated
XIDs, which is a state expected by design. If this record gets applied
on a standby to initial its recovery state, then it would simply fail,
so the odds of facing this failure are very low in practice. It would
be tempting to change the generation of XLOG_RUNNING_XACTS so as
duplicates are removed on the source, but this requires to hold on
ProcArrayLock for longer and this would impact all workloads,
particularly those using heavily two-phase transactions.
XLOG_RUNNING_XACTS is also actually used only to initialize the standby
state at recovery, so instead the solution is taken to discard
duplicates when applying the initial snapshot.
Diagnosed-by: Konstantin Knizhnik
Author: Michael Paquier
Discussion: https://postgr.es/m/
0c96b653-4696-d4b4-6b5d-
78143175d113@postgrespro.ru
Backpatch-through: 9.3