- VACUUM> normally skips pages that don't have any dead row
- versions, but those pages might still have row versions with old XID
- values. To ensure all old row versions have been frozen, a
- scan of the whole table is needed.
- controls when
- VACUUM> does that: a whole table sweep is forced if
- the table hasn't been fully scanned for vacuum_freeze_table_age>
- minus vacuum_freeze_min_age> transactions. Setting it to 0
- forces VACUUM> to always scan all pages, effectively ignoring
- the visibility map.
+ VACUUM> uses the visibility map>
+ to determine which pages of a relation must be scanned. Normally, it
+ will skips pages that don't have any dead row versions even if those pages
+ might still have row versions with old XID values. Therefore, normal
+ scans won't succeed in freezing every row version in the table.
+ Periodically, VACUUM> will perform an aggressive
+ vacuum>, skipping only those pages which contain neither dead rows nor
+ any unfrozen XID or MXID values.
+
+ controls when VACUUM> does that: all-visible but not all-frozen
+ pages are scanned if the number of transactions that have passed since the
+ last such scan is greater than vacuum_freeze_table_age> minus
+ vacuum_freeze_min_age>. Setting
+ vacuum_freeze_table_age> to 0 forces VACUUM> to
+ use this more aggressive strategy for all scans.
The maximum time that a table can go unvacuumed is two billion
transactions minus the vacuum_freeze_min_age> value at
- the time VACUUM> last scanned the whole table. If it were to go
+ the time of the last aggressive vacuum. If it were to go
unvacuumed for longer than
that, data loss could result. To ensure that this does not happen,
autovacuum is invoked on any table that might contain unfrozen rows with
normal delete and update activity is run in that window. Setting it too
close could lead to anti-wraparound autovacuums, even though the table
was recently vacuumed to reclaim space, whereas lower values lead to more
- frequent whole-table scans .
+ frequent aggressive vacuuming .
pg_database>. In particular,
the relfrozenxid> column of a table's
pg_class> row contains the freeze cutoff XID that was used
- by the last whole-tabl e VACUUM> for that table. All rows
+ by the last aggressiv e VACUUM> for that table. All rows
inserted by transactions with XIDs older than this cutoff XID are
guaranteed to have been frozen. Similarly,
the datfrozenxid> column of a database's
- VACUUM> normally
- only scans pages that have been modified since the last vacuum, but
- relfrozenxid> can only be advanced when the whole table is
- scanned. The whole table is scanned when relfrozenxid> is
- more than vacuum_freeze_table_age> transactions old, when
- VACUUM>'s FREEZE> option is used, or when all pages
- happen to
+ VACUUM> normally only scans pages that have been modified
+ since the last vacuum, but relfrozenxid> can only be
+ advanced when every page of the table
+ that might contain unfrozen XIDs is scanned. This happens when
+ relfrozenxid> is more than
+ vacuum_freeze_table_age> transactions old, when
+ VACUUM>'s FREEZE> option is used, or when all
+ pages that are not already all-frozen happen to
require vacuuming to remove dead row versions. When VACUUM>
- scans the whole table, after it's finished age(relfrozenxid)>
- should be a little more than the vacuum_freeze_min_age> setting
- that was used (more by the number of transactions started since the
- VACUUM> started). If no whole-table-scanning VACUUM>
- is issued on the table until autovacuum_freeze_max_age> is
- reached, an autovacuum will soon be forced for the table.
+ scans every page in the table that is not already all-frozen, it should
+ set age(relfrozenxid)> to a value just a little more than the
+ vacuum_freeze_min_age> setting
+ that was used (more by the number of transcations started since the
+ VACUUM> started). If no relfrozenxid>-advancing
+ VACUUM> is issued on the table until
+ autovacuum_freeze_max_age> is reached, an autovacuum will soon
+ be forced for the table.
- During a VACUUM> table scan, either partial or of the whol e
- table, any multixact ID older than
+ Whenever VACUUM> scans any part of a table, it will replac e
+ any multixact ID it encounters which is older than
- is replaced by a different value, which can be the zero value, a single
+ by a different value, which can be the zero value, a single
transaction ID, or a newer multixact ID. For each table,
pg_class>.relminmxid> stores the oldest
possible multixact ID still appearing in any tuple of that table.
If this value is older than
- , a whole-table
- scan is forced. mxid_age()> can be used on
+ , an aggressive
+ vacuum is forced. As discussed in the previous section, an aggressive
+ vacuum means that only those pages which are known to be all-frozen will
+ be skipped. mxid_age()> can be used on
pg_class>.relminmxid> to find its age.
- Whole-tabl e VACUUM> scans, regardless of
+ Aggressiv e VACUUM> scans, regardless of
what causes them, enable advancing the value for that table.
Eventually, as all tables in all databases are scanned and their
oldest multixact values are advanced, on-disk storage for older
- As a safety device, a whole-tabl e vacuum scan will occur for any table
+ As a safety device, an aggressiv e vacuum scan will occur for any table
whose multixact-age is greater than
- . Whole-tabl e
+ . Aggressiv e
vacuum scans will also occur progressively for all tables, starting with
those that have the oldest multixact-age, if the amount of used member
storage space exceeds the amount 50% of the addressable storage space.
- Both of these kinds of whole-tabl e scans will occur even if autovacuum is
+ Both of these kinds of aggressiv e scans will occur even if autovacuum is
nominally disabled.
UPDATE and DELETE operation. (It
is only semi-accurate because some information might be lost under heavy
load.) If the relfrozenxid> value of the table is more
- than vacuum_freeze_table_age> transactions old, the whol e
- table is scann ed to freeze old tuples and advance
- relfrozenxid>, otherwise only pages that have been modified
+ than vacuum_freeze_table_age> transactions old, an aggressiv e
+ vacuum is perform ed to freeze old tuples and advance
+ relfrozenxid>; otherwise, only pages that have been modified
since the last vacuum are scanned.
BlockNumber rel_pages; /* total number of pages */
BlockNumber scanned_pages; /* number of pages we examined */
BlockNumber pinskipped_pages; /* # of pages we skipped due to a pin */
+ BlockNumber frozenskipped_pages; /* # of frozen pages we skipped */
double scanned_tuples; /* counts only tuples on scanned pages */
double old_rel_tuples; /* previous value of pg_class.reltuples */
double new_rel_tuples; /* new estimated total # of tuples */
/* non-export function prototypes */
static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
- Relation *Irel, int nindexes, bool scan_all );
+ Relation *Irel, int nindexes, bool aggressive );
static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
static void lazy_vacuum_index(Relation indrel,
int usecs;
double read_rate,
write_rate;
- bool scan_all; /* should we scan all pages? */
- bool scanned_all; /* did we actually scan all pages? */
+ bool aggressive; /* should we scan all unfrozen pages? */
+ bool scanned_all_unfrozen; /* actually scanned all such pages? */
TransactionId xidFullScanLimit;
MultiXactId mxactFullScanLimit;
BlockNumber new_rel_pages;
&MultiXactCutoff, &mxactFullScanLimit);
/*
- * We request a full scan if either the table's frozen Xid is now older
- * than or equal to the requested Xid full-table scan limit; or if the
- * table's minimum MultiXactId is older than or equal to the requested
+ * We request an aggressive scan if either the table's frozen Xid is now
+ * older than or equal to the requested Xid full-table scan limit; or if
+ * the t able's minimum MultiXactId is older than or equal to the requested
* mxid full-table scan limit.
*/
- scan_all = TransactionIdPrecedesOrEquals(onerel->rd_rel->relfrozenxid,
- xidFullScanLimit);
- scan_all |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid,
- mxactFullScanLimit);
+ aggressive = TransactionIdPrecedesOrEquals(onerel->rd_rel->relfrozenxid,
+ xidFullScanLimit);
+ aggressive |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid,
+ mxactFullScanLimit);
vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
vacrelstats->hasindex = (nindexes > 0);
/* Do the vacuuming */
- lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, scan_all );
+ lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, aggressive );
/* Done with indexes */
vac_close_indexes(nindexes, Irel, NoLock);
* NB: We need to check this before truncating the relation, because that
* will change ->rel_pages.
*/
- if (vacrelstats->scanned_pages < vacrelstats->rel_pages)
+ if ((vacrelstats->scanned_pages + vacrelstats->frozenskipped_pages)
+ < vacrelstats->rel_pages)
{
- Assert(!scan_all );
- scanned_all = false;
+ Assert(!aggressive );
+ scanned_all_unfrozen = false;
}
else
- scanned_all = true;
+ scanned_all_unfrozen = true;
/*
* Optionally truncate the relation.
if (new_rel_allvisible > new_rel_pages)
new_rel_allvisible = new_rel_pages;
- new_frozen_xid = scanned_all ? FreezeLimit : InvalidTransactionId;
- new_min_multi = scanned_all ? MultiXactCutoff : InvalidMultiXactId;
+ new_frozen_xid = scanned_all_unfrozen ? FreezeLimit : InvalidTransactionId;
+ new_min_multi = scanned_all_unfrozen ? MultiXactCutoff : InvalidMultiXactId;
vac_update_relstats(onerel,
new_rel_pages,
get_namespace_name(RelationGetNamespace(onerel)),
RelationGetRelationName(onerel),
vacrelstats->num_index_scans);
- appendStringInfo(&buf, _("pages: %u removed, %u remain, %u skipped due to pins\n"),
+ appendStringInfo(&buf, _("pages: %u removed, %u remain, %u skipped due to pins, %u skipped frozen \n"),
vacrelstats->pages_removed,
vacrelstats->rel_pages,
- vacrelstats->pinskipped_pages);
+ vacrelstats->pinskipped_pages,
+ vacrelstats->frozenskipped_pages);
appendStringInfo(&buf,
_("tuples: %.0f removed, %.0f remain, %.0f are dead but not yet removable\n"),
vacrelstats->tuples_deleted,
*/
static void
lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
- Relation *Irel, int nindexes, bool scan_all )
+ Relation *Irel, int nindexes, bool aggressive )
{
BlockNumber nblocks,
blkno;
int i;
PGRUsage ru0;
Buffer vmbuffer = InvalidBuffer;
- BlockNumber next_not_all_visi ble_block;
- bool skipping_all_visible_ blocks;
+ BlockNumber next_unskippa ble_block;
+ bool skipping_blocks;
xl_heap_freeze_tuple *frozen;
StringInfoData buf;
frozen = palloc(sizeof(xl_heap_freeze_tuple) * MaxHeapTuplesPerPage);
/*
- * We want to skip pages that don't require vacuuming according to th e
- * visibility map, but only when we can skip at least SKIP_PAGES_THRESHOLD
- * consecutive pages. Since we're reading sequentially, the OS should be
- * doing readahead for us, so there's no gain in skipping a page now and
- * then; that's likely to disable readahead and so be counterproductive.
- * Also, skipping even a single page means that we can't updat e
- * relfrozenxid, so we only want to do it if we can skip a goodly number
- * of pages.
+ * Except when aggressive is set, we want to skip pages that ar e
+ * all-visible according to the visibility map, but only when we can skip
+ * at least SKIP_PAGES_THRESHOLD consecutive pages. Since we're reading
+ * sequentially, the OS should be doing readahead for us, so there's no
+ * gain in skipping a page now and then; that's likely to disable
+ * readahead and so be counterproductive. Also, skipping even a singl e
+ * page means that we can't update relfrozenxid, so we only want to do it
+ * if we can skip a goodly number of pages.
*
- * Before entering the main loop, establish the invariant that
- * next_not_all_visible_block is the next block number >= blkno that's not
- * all-visible according to the visibility map, or nblocks if there's no
- * such block. Also, we set up the skipping_all_visible_blocks flag,
- * which is needed because we need hysteresis in the decision: once we've
- * started skipping blocks, we may as well skip everything up to the next
- * not-all-visible block.
+ * When aggressive is set, we can't skip pages just because they are
+ * all-visible, but we can still skip pages that are all-frozen, since
+ * such pages do not need freezing and do not affect the value that we can
+ * safely set for relfrozenxid or relminmxid.
*
- * Note: if scan_all is true, we won't actually skip any pages; but we
- * maintain next_not_all_visible_block anyway, so as to set up the
- * all_visible_according_to_vm flag correctly for each page.
+ * Before entering the main loop, establish the invariant that
+ * next_unskippable_block is the next block number >= blkno that's not we
+ * can't skip based on the visibility map, either all-visible for a
+ * regular scan or all-frozen for an aggressive scan. We set it to
+ * nblocks if there's no such block. We also set up the skipping_blocks
+ * flag correctly at this stage.
*
* Note: The value returned by visibilitymap_get_status could be slightly
* out-of-date, since we make this test before reading the corresponding
* heap page or locking the buffer. This is OK. If we mistakenly think
- * that the page is all-visible when in fact the flag's just been cleared,
- * we might fail to vacuum the page. But it's OK to skip pages when
- * scan_all is not set, so no great harm done; the next vacuum will find
- * them. If we make the reverse mistake and vacuum a page unnecessarily,
- * it'll just be a no-op.
+ * that the page is all-visible or all-frozen when in fact the flag's just
+ * been cleared, we might fail to vacuum the page. It's easy to see that
+ * skipping a page when aggressive is not set is not a very big deal; we
+ * might leave some dead tuples lying around, but the next vacuum will
+ * find them. But even when aggressive *is* set, it's still OK if we miss
+ * a page whose all-frozen marking has just been cleared. Any new XIDs
+ * just added to that page are necessarily newer than the GlobalXmin we
+ * computed, so they'll have no effect on the value to which we can safely
+ * set relfrozenxid. A similar argument applies for MXIDs and relminmxid.
*
* We will scan the table's last page, at least to the extent of
* determining whether it has tuples or not, even if it should be skipped
* the last page. This is worth avoiding mainly because such a lock must
* be replayed on any hot standby, where it can be disruptive.
*/
- for (next_not_all_visi ble_block = 0;
- next_not_all_visi ble_block < nblocks;
- next_not_all_visi ble_block++)
+ for (next_unskippa ble_block = 0;
+ next_unskippa ble_block < nblocks;
+ next_unskippa ble_block++)
{
- if (!VM_ALL_VISIBLE(onerel, next_not_all_visible_block, &vmbuffer))
- break;
+ uint8 vmstatus;
+
+ vmstatus = visibilitymap_get_status(onerel, next_unskippable_block,
+ &vmbuffer);
+ if (aggressive)
+ {
+ if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+ break;
+ }
+ else
+ {
+ if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ break;
+ }
vacuum_delay_point();
}
- if (next_not_all_visible_block >= SKIP_PAGES_THRESHOLD)
- skipping_all_visible_blocks = true;
+
+ if (next_unskippable_block >= SKIP_PAGES_THRESHOLD)
+ skipping_blocks = true;
else
- skipping_all_visible_ blocks = false;
+ skipping_blocks = false;
for (blkno = 0; blkno < nblocks; blkno++)
{
int prev_dead_count;
int nfrozen;
Size freespace;
- bool all_visible_according_to_vm;
+ bool all_visible_according_to_vm = false ;
bool all_visible;
bool all_frozen = true; /* provided all_visible is also true */
bool has_dead_tuples;
#define FORCE_CHECK_PAGE() \
(blkno == nblocks - 1 && should_attempt_truncation(vacrelstats))
- if (blkno == next_not_all_visi ble_block)
+ if (blkno == next_unskippa ble_block)
{
- /* Time to advance next_not_all_visi ble_block */
- for (next_not_all_visi ble_block++;
- next_not_all_visi ble_block < nblocks;
- next_not_all_visi ble_block++)
+ /* Time to advance next_unskippa ble_block */
+ for (next_unskippa ble_block++;
+ next_unskippa ble_block < nblocks;
+ next_unskippa ble_block++)
{
- if (!VM_ALL_VISIBLE(onerel, next_not_all_visible_block, &vmbuffer))
- break;
+ uint8 vmskipflags;
+
+ vmskipflags = visibilitymap_get_status(onerel,
+ next_unskippable_block,
+ &vmbuffer);
+ if (aggressive)
+ {
+ if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
+ break;
+ }
+ else
+ {
+ if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ break;
+ }
vacuum_delay_point();
}
* skipping_all_visible_blocks to do the right thing at the
* following blocks.
*/
- if (next_not_all_visi ble_block - blkno > SKIP_PAGES_THRESHOLD)
- skipping_all_visible_ blocks = true;
+ if (next_unskippa ble_block - blkno > SKIP_PAGES_THRESHOLD)
+ skipping_blocks = true;
else
- skipping_all_visible_blocks = false;
- all_visible_according_to_vm = false;
+ skipping_blocks = false;
+
+ /*
+ * Normally, the fact that we can't skip this block must mean that
+ * it's not all-visible. But in an aggressive vacuum we know only
+ * that it's not all-frozen, so it might still be all-visible.
+ */
+ if (aggressive && VM_ALL_VISIBLE(onerel, blkno, &vmbuffer))
+ all_visible_according_to_vm = true;
}
else
{
- /* Current block is all-visible */
- if (skipping_all_visible_blocks && !scan_all && !FORCE_CHECK_PAGE())
+ /*
+ * The current block is potentially skippable; if we've seen a
+ * long enough run of skippable blocks to justify skipping it, and
+ * we're not forced to check it, then go ahead and skip.
+ * Otherwise, the page must be at least all-visible if not
+ * all-frozen, so we can set all_visible_according_to_vm = true.
+ */
+ if (skipping_blocks && !FORCE_CHECK_PAGE())
+ {
+ /*
+ * Tricky, tricky. If this is in aggressive vacuum, the page
+ * must have been all-frozen at the time we checked whether it
+ * was skippable, but it might not be any more. We must be
+ * careful to count it as a skipped all-frozen page in that
+ * case, or else we'll think we can't update relfrozenxid and
+ * relminmxid. If it's not an aggressive vacuum, we don't
+ * know whether it was all-frozen, so we have to recheck; but
+ * in this case an approximate answer is OK.
+ */
+ if (aggressive || VM_ALL_FROZEN(onerel, blkno, &vmbuffer))
+ vacrelstats->frozenskipped_pages++;
continue;
+ }
all_visible_according_to_vm = true;
}
* Pin the visibility map page in case we need to mark the page
* all-visible. In most cases this will be very cheap, because we'll
* already have the correct page pinned anyway. However, it's
- * possible that (a) next_not_all_visible_block is covered by a
- * different VM page than the current block or (b) we released our pin
- * and did a cycle of index vacuuming.
+ * possible that (a) next_unskippable_block is covered by a different
+ * VM page than the current block or (b) we released our pin and did a
+ * cycle of index vacuuming.
+ *
*/
visibilitymap_pin(onerel, blkno, &vmbuffer);
if (!ConditionalLockBufferForCleanup(buf))
{
/*
- * If we're not scanning the whole relatio n to guard against XID
+ * If we're not performing an aggressive sca n to guard against XID
* wraparound, and we don't want to forcibly check the page, then
* it's OK to skip vacuuming pages we get a lock conflict on. They
* will be dealt with in some future vacuum.
*/
- if (!scan_all && !FORCE_CHECK_PAGE())
+ if (!aggressive && !FORCE_CHECK_PAGE())
{
ReleaseBuffer(buf);
vacrelstats->pinskipped_pages++;
* ourselves for multiple buffers and then service whichever one
* is received first. For now, this seems good enough.
*
- * If we get here with scan_all false, then we're just forcibly
+ * If we get here with aggressive false, then we're just forcibly
* checking the page, and so we don't want to insist on getting
* the lock; we only need to know if the page contains tuples, so
* that we can update nonempty_pages correctly. It's convenient
vacrelstats->nonempty_pages = blkno + 1;
continue;
}
- if (!scan_all )
+ if (!aggressive )
{
/*
* Here, we must not advance scanned_pages; that would amount