From 449e14a5618432f01066c33055229b96666bd925 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Thu, 21 May 2020 13:36:45 -0700 Subject: [PATCH] Doc: Describe CREATE INDEX deduplication strategy. The B-Tree index deduplication strategy used during CREATE INDEX and REINDEX differs from the lazy strategy used by retail inserts. Make that clear by adding a new paragraph to the B-Tree implementation section of the documentation. In passing, do some copy-editing of nearby deduplication documentation. --- doc/src/sgml/btree.sgml | 54 ++++++++++++++++++++++++++++------------- 1 file changed, 37 insertions(+), 17 deletions(-) diff --git a/doc/src/sgml/btree.sgml b/doc/src/sgml/btree.sgml index e9cab4a55dc..73947db55cb 100644 --- a/doc/src/sgml/btree.sgml +++ b/doc/src/sgml/btree.sgml @@ -622,12 +622,13 @@ equalimage(opcintype oid) returns bool - While NULL is generally not considered to be equal to any other - value, including NULL, NULL is nevertheless treated as just - another value from the domain of indexed values by the B-Tree - implementation (except when enforcing uniqueness in a unique - index). B-Tree deduplication is therefore just as effective with - duplicates that contain a NULL value. + B-Tree deduplication is just as effective with + duplicates that contain a NULL value, even though + NULL values are never equal to each other according to the + = member of any B-Tree operator class. As far + as any part of the implementation that understands the on-disk + B-Tree structure is concerned, NULL is just another value from the + domain of indexed values. @@ -642,6 +643,20 @@ equalimage(opcintype oid) returns bool see a moderate performance benefit from using deduplication. Deduplication is enabled by default. + + CREATE INDEX and REINDEX + apply deduplication to create posting list tuples, though the + strategy they use is slightly different. Each group of duplicate + ordinary tuples encountered in the sorted input taken from the + table is merged into a posting list tuple + before being added to the current pending leaf + page. Individual posting list tuples are packed with as many + TIDs as possible. Leaf pages are written out in + the usual way, without any separate deduplication pass. This + strategy is well-suited to CREATE INDEX and + REINDEX because they are once-off batch + operations. + Write-heavy workloads that don't benefit from deduplication due to having few or no duplicate values in indexes will incur a small, @@ -657,17 +672,22 @@ equalimage(opcintype oid) returns bool B-Tree indexes are not directly aware that under MVCC, there might be multiple extant versions of the same logical table row; to an index, each tuple is an independent object that needs its own index - entry. Thus, an update of a row always creates all-new index - entries for the row, even if the key values did not change. Some - workloads suffer from index bloat caused by these - implementation-level version duplicates (this is typically a - problem for UPDATE-heavy workloads that cannot - apply the HOT optimization due to modifying at - least one indexed column). B-Tree deduplication does not - distinguish between these implementation-level version duplicates - and conventional duplicates. Deduplication can nevertheless help - with controlling index bloat caused by implementation-level version - churn. + entry. Version duplicates may sometimes accumulate + and adversely affect query latency and throughput. This typically + occurs with UPDATE-heavy workloads where most + individual updates cannot apply the HOT + optimization (often because at least one indexed column gets + modified, necessitating a new set of index tuple versions — + one new tuple for each and every index). In + effect, B-Tree deduplication ameliorates index bloat caused by + version churn. Note that even the tuples from a unique index are + not necessarily physically unique when stored + on disk due to version churn. The deduplication optimization is + selectively applied within unique indexes. It targets those pages + that appear to have version duplicates. The high level goal is to + give VACUUM more time to run before an + unnecessary page split caused by version churn can + take place. -- 2.39.5