-
PostgreSQL's
VACUUM> command must be
- run on a regular basis for several reasons:
+
PostgreSQL's
VACUUM> command
+ must be run on a regular basis for several reasons:
- In normal
PostgreSQL operation, an
UPDATE> or
- DELETE> of a row does not immediately remove the old tuple>
- (version of the row). This approach is necessary to gain the benefits
- of multiversion concurrency control (see the User's Guide>):
- the tuple must not be deleted while
- it is still potentially visible to other transactions. But eventually,
- an outdated or deleted tuple is no longer of interest to any transaction.
- The space it occupies must be reclaimed for reuse by new tuples, to avoid
- infinite growth of disk space requirements. This is done by running
- VACUUM>.
+ In normal
PostgreSQL operation, an
+ UPDATE> or DELETE> of a row does not
+ immediately remove the old tuple> (version of the row).
+ This approach is necessary to gain the benefits of multiversion
+ concurrency control (see the User's Guide>): the tuple
+ must not be deleted while it is still potentially visible to other
+ transactions. But eventually, an outdated or deleted tuple is no
+ longer of interest to any transaction. The space it occupies must be
+ reclaimed for reuse by new tuples, to avoid infinite growth of disk
+ space requirements. This is done by running VACUUM>.
Clearly, a table that receives frequent updates or deletes will need
- to be vacuumed more often than tables that are seldom updated. It may
- be useful to set up periodic
cron> tasks that vacuum only selected tables,
- skipping tables that are known not to change often. This is only likely
- to be helpful if you have both large heavily-updated tables and large
- seldom-updated tables --- the extra cost of vacuuming a small table
- isn't enough to be worth worrying about.
+ to be vacuumed more often than tables that are seldom updated. It
+ may be useful to set up periodic
cron> tasks that
+ vacuum only selected tables, skipping tables that are known not to
+ change often. This is only likely to be helpful if you have both
+ large heavily-updated tables and large seldom-updated tables --- the
+ extra cost of vacuuming a small table isn't enough to be worth
+ worrying about.
As with vacuuming for space recovery, frequent updates of statistics
- are more useful for heavily-updated tables than for seldom-updated ones.
- But even for a heavily-updated table, there may be no need for
- statistics updates if the statistical distribution of the data is not
- changing much. A simple rule of thumb is to think about how much
+ are more useful for heavily-updated tables than for seldom-updated
+ ones. But even for a heavily-updated table, there may be no need for
+ statistics updates if the statistical distribution of the data is
+ not changing much. A simple rule of thumb is to think about how much
the minimum and maximum values of the columns in the table change.
- For example, a timestamp column that contains the time of row update
- will have a constantly-increasing maximum value as rows are added and
- updated; such a column will probably need more frequent statistics
- updates than, say, a column containing URLs for pages accessed on a
- website. The URL column may receive changes just as often, but the
- statistical distribution of its values probably changes relatively
- slowly.
+ For example, a timestamp column that contains the time
+ of row update will have a constantly-increasing maximum value as
+ rows are added and updated; such a column will probably need more
+ frequent statistics updates than, say, a column containing URLs for
+ pages accessed on a website. The URL column may receive changes just
+ as often, but the statistical distribution of its values probably
+ changes relatively slowly.
Prior to
PostgreSQL 7.2, the only defense
- against XID wraparound was to re-initdb> at least every 4 billion
- transactions. This of course was not very satisfactory for high-traffic
- sites, so a better solution has been devised. The new approach allows an
- installation to remain up indefinitely, without initdb> or any sort of
- restart. The price is this maintenance requirement:
- every table in the database must be vacuumed at least once every
- billion transactions.
+ against XID wraparound was to re-initdb> at least every 4
+ billion transactions. This of course was not very satisfactory for
+ high-traffic sites, so a better solution has been devised. The new
+ approach allows an installation to remain up indefinitely, without
+ initdb> or any sort of restart. The price is this
+ maintenance requirement: every table in the database must
+ be vacuumed at least once every billion transactions.
- In practice this isn't an onerous requirement, but since the consequences
- of failing to meet it can be complete data loss (not just wasted disk
- space or slow performance), some special provisions have been made to help
- database administrators keep track of the time since the last
- VACUUM>. The remainder of this section gives the details.
+ In practice this isn't an onerous requirement, but since the
+ consequences of failing to meet it can be complete data loss (not
+ just wasted disk space or slow performance), some special provisions
+ have been made to help database administrators keep track of the
+ time since the last VACUUM>. The remainder of this
+ section gives the details.
- The new approach to XID comparison distinguishes two special XIDs, numbers
- 1 and 2 (BootstrapXID> and FrozenXID>). These two
- XIDs are always considered older than every normal XID. Normal XIDs (those
- greater than 2) are compared using modulo-231> arithmetic. This means
+ The new approach to XID comparison distinguishes two special XIDs,
+ numbers 1 and 2 (BootstrapXID> and
+ FrozenXID>). These two XIDs are always considered older
+ than every normal XID. Normal XIDs (those greater than 2) are
+ compared using modulo-231> arithmetic. This means
that for every normal XID, there are two billion XIDs that are
- older> and two billion that are newer>; another way to
- say it is that the normal XID space is circular with no endpoint.
- Therefore, once a tuple has been created with a particular normal XID, the
- tuple will appear to be in the past> for the next two billion
- transactions, no matter which normal XID we are talking about. If the
- tuple still exists after more than two billion transactions, it will
- suddenly appear to be in the future. To prevent data loss, old tuples
- must be reassigned the XID FrozenXID> sometime before they reach
- the two-billion-transactions-old mark. Once they are assigned this
- special XID, they will appear to be in the past> to all normal
- transactions regardless of wraparound issues, and so such tuples will be
- good until deleted, no matter how long that is. This reassignment of
- XID is handled by VACUUM>.
+ older> and two billion that are newer>; another
+ way to say it is that the normal XID space is circular with no
+ endpoint. Therefore, once a tuple has been created with a particular
+ normal XID, the tuple will appear to be in the past> for
+ the next two billion transactions, no matter which normal XID we are
+ talking about. If the tuple still exists after more than two billion
+ transactions, it will suddenly appear to be in the future. To
+ prevent data loss, old tuples must be reassigned the XID
+ FrozenXID> sometime before they reach the
+ two-billion-transactions-old mark. Once they are assigned this
+ special XID, they will appear to be in the past> to all
+ normal transactions regardless of wraparound issues, and so such
+ tuples will be good until deleted, no matter how long that is. This
+ reassignment of XID is handled by VACUUM>.
VACUUM> with the FREEZE> option uses a more
aggressive freezing policy: tuples are frozen if they are old enough
- to be considered good by all open transactions. In particular, if
- a VACUUM FREEZE> is performed in an otherwise-idle database,
- it is guaranteed that all> tuples in that database will be
- frozen. Hence, as long as the database is not modified in any way, it
- will not need subsequent vacuuming to avoid transaction ID wraparound
- problems. This technique is used by initdb> to prepare the
- template0> database. It should also be used to prepare any
- user-created databases that are to be marked datallowconn> =
- false> in pg_database>, since there isn't any
- convenient way to vacuum a database that you can't connect to. Note
- that VACUUM's automatic warning message about unvacuumed databases will
- ignore pg_database> entries with datallowconn> =
- false>, so as to avoid giving false warnings about these
- databases; therefore it's up to you to ensure that such databases are
- frozen correctly.
+ to be considered good by all open transactions. In particular, if a
+ VACUUM FREEZE> is performed in an otherwise-idle
+ database, it is guaranteed that all> tuples in that
+ database will be frozen. Hence, as long as the database is not
+ modified in any way, it will not need subsequent vacuuming to avoid
+ transaction ID wraparound problems. This technique is used by
+ initdb> to prepare the template0> database.
+ It should also be used to prepare any user-created databases that
+ are to be marked datallowconn> = false> in
+ pg_database>, since there isn't any convenient way to
+ vacuum a database that you can't connect to. Note that
+ VACUUM's automatic warning message about
+ unvacuumed databases will ignore pg_database> entries
+ with datallowconn> = false>, so as to avoid
+ giving false warnings about these databases; therefore it's up to
+ you to ensure that such databases are frozen correctly.
-
PostgreSQL is unable to reuse index pages
- in some cases. The problem is that if indexed rows are deleted, those
- indexes pages can only be reused by rows with similar values. In
- cases where low indexed rows are deleted and newly inserted rows have
- high values, disk space used by the index will grow indefinately, even
- if VACUUM> is run frequently.
- TO BE COMPLETED 2002-06-22 bjm
+
PostgreSQL is unable to reuse btree index
+ pages in certain cases. The problem is that if indexed rows are
+ deleted, those index pages can only be reused by rows with similar
+ values. For example, if indexed rows are deleted and newly
+ inserted/updated rows have much higher values, the new rows can't use
+ the index space made available by the deleted rows. Instead, such
+ new rows must be placed on new index pages. In such cases, disk
+ space used by the index will grow indefinately, even if
+ VACUUM> is run frequently.
+
+ As a solution, you can use the REINDEX> command
+ periodically to discard pages used by deleted rows. There is also
+ contrib/reindex> which can reindex an entire database.
- If you simply direct the postmaster's stderr> into a file, the only way
- to truncate the log file is to stop and restart the postmaster. This
- may be OK for development setups but you won't want to run a production
- server that way.
+ If you simply direct the postmaster's stderr> into a
+ file, the only way to truncate the log file is to stop and restart
+ the postmaster. This may be OK for development setups but you won't
+ want to run a production server that way.
- The simplest production-grade approach to managing log output is to send it
- all to
syslog> and let syslog> deal with file
- rotation. To do this, make sure
PostgreSQL> was built with
- the
- <literal>syslog> to 2
- (log to syslog only) in postgresql.conf>.
- Then you can send a SIGHUP signal to the
- <application>syslog> daemon whenever you want to force it to start
- writing a new log file.
+ The simplest production-grade approach to managing log output is to
+ send it all to
syslog> and let syslog>
+ deal with file rotation. To do this, make sure
+
PostgreSQL> was built with the
+ <option>--enable-syslog> configure option, and set
+ syslog> to 2 (log to syslog only) in
+ postgresql.conf>. Then you can send a
+ <
literal>SIGHUP signal to the syslog> daemon
+ whenever you want to force it to start writing a new log file.
On many systems, however, syslog is not very reliable, particularly
with large log messages; it may truncate or drop messages just when
- you need them the most. You may find it more useful to pipe the
-
postmaster>'s stderr> to some type of log rotation script.
- If you start the postmaster with
pg_ctl>, then the
- postmaster's stderr> is already redirected to stdout>, so you just need a
+ you need them the most. You may find it more useful to pipe the
+
postmaster>'s stderr> to some type of
+ log rotation script. If you start the postmaster with
+
pg_ctl>, then the postmaster's stderr>
+ is already redirected to stdout>, so you just need a
pipe command: