+ linkend="guc-max-standby-streaming-delay">, that define the maximum
+ allowed delay in WAL application. Conflicting queries will be canceled
+ once it has taken longer than the relevant delay setting to apply any
+ newly-received WAL data. There are two parameters so that different delay
+ values can be specified for the case of reading WAL data from an archive
+ (i.e., initial recovery from a base backup or catching up> a
+ standby server that has fallen far behind) versus reading WAL data via
+ streaming replication.
- Experienced users should note that both row version cleanup and row version
- freezing will potentially conflict with recovery queries. Running a
- manual VACUUM FREEZE> is likely to cause conflicts even on tables
- with no updated or deleted rows.
+ In a standby server that exists primarily for high availability, it's
+ best to set the delay parameters relatively short, so that the server
+ cannot fall far behind the primary due to delays caused by standby
+ queries. However, if the standby server is meant for executing
+ long-running queries, then a high or even infinite delay value may be
+ preferable. Keep in mind however that a long-running query could
+ cause other sessions on the standby server to not see recent changes
+ on the primary, if it delays application of WAL records.
- There are a number of choices for resolving query conflicts. The default
- is to wait and hope the query finishes. The server will wait
- automatically until the lag between primary and standby is at most
- (30 seconds by default).
- Once that grace period expires,
- one of the following actions is taken:
-
-
-
- If the conflict is caused by a lock, the conflicting standby
- transaction is cancelled immediately. If the transaction is
- idle-in-transaction, then the session is aborted instead.
- This behavior might change in the future.
-
-
-
-
- If the conflict is caused by cleanup records, the standby query is informed
- a conflict has occurred and that it must cancel itself to avoid the
- risk that it silently fails to read relevant data because
- that data has been removed. Some cleanup
- records only conflict with older queries, while others
- can affect all queries.
-
-
- Cancelled queries may be retried immediately (after beginning a new
- transaction, of course). Since query cancellation depends on
- the nature of the WAL records being replayed, a query that was
- cancelled may succeed if it is executed again.
-
-
-
+ The most common reason for conflict between standby queries and WAL replay
+ is
early cleanup>. Normally, PostgreSQL> allows
+ cleanup of old row versions when there are no transactions that need to
+ see them to ensure correct visibility of data according to MVCC rules.
+ However, this rule can only be applied for transactions executing on the
+ master. So it is possible that cleanup on the master will remove row
+ versions that are still visible to a transaction on the standby.
- Keep in mind that max_standby_delay> is compared to the
- difference between the standby server's clock and the transaction
- commit timestamps read from the WAL log. Thus, the grace period
- allowed to any one query on the standby is never more than
- max_standby_delay>, and could be considerably less if the
- standby has already fallen behind as a result of waiting for previous
- queries to complete, or as a result of being unable to keep up with a
- heavy update load.
+ Experienced users should note that both row version cleanup and row version
+ freezing will potentially conflict with standby queries. Running a manual
+ VACUUM FREEZE> is likely to cause conflicts even on tables with
+ no updated or deleted rows.
-
- Be sure that the primary and standby servers' clocks are kept in sync;
- otherwise the values compared to max_standby_delay> will be
- erroneous, possibly leading to additional query cancellations.
- If the clocks are intentionally not in sync, or if there is a large
- propagation delay from primary to standby, it is advisable to set
- max_standby_delay> to -1. In any case the value should be
- larger than the largest expected clock skew between primary and standby.
-
-
+ Once the delay specified by max_standby_archive_delay> or
+ max_standby_streaming_delay> has been exceeded, conflicting
+ queries will be cancelled. This usually results just in a cancellation
+ error, although in the case of replaying a DROP DATABASE>
+ the entire conflicting session will be terminated. Also, if the conflict
+ is over a lock held by an idle transaction, the conflicting session is
+ terminated (this behavior might change in the future).
+
- Users should be clear that tables that are regularly and heavily updated on the
- primary server will quickly cause cancellation of longer running queries on
- the standby. In those cases max_standby_delay> can be
- considered similar to setting
- statement_timeout>.
-
+ Cancelled queries may be retried immediately (after beginning a new
+ transaction, of course). Since query cancellation depends on
+ the nature of the WAL records being replayed, a query that was
+ cancelled may well succeed if it is executed again.
+
- Other remedial actions exist if the number of cancellations is unacceptable.
- The first option is to connect to the primary server and keep a query active
- for as long as needed to run queries on the standby. This guarantees that
- a WAL cleanup record is never generated and query conflicts do not occur,
- as described above. This could be done using contrib/dblink>
- and pg_sleep()>, or via other mechanisms. If you do this, you
- should note that this will delay cleanup of dead rows on the primary by
- vacuum or HOT, which may be undesirable. However, remember
- that the primary and standby nodes are linked via the WAL, so the cleanup
- situation is no different from the case where the query ran on the primary
- node itself, and you are still getting the benefit of off-loading the
- execution onto the standby. max_standby_delay> should
- not be used in this case because delayed WAL files might already
- contain entries that invalidate the current snapshot.
+ Keep in mind that the delay parameters are compared to the elapsed time
+ since the WAL data was received by the standby server. Thus, the grace
+ period allowed to any one query on the standby is never more than the
+ delay parameter, and could be considerably less if the standby has already
+ fallen behind as a result of waiting for previous queries to complete, or
+ as a result of being unable to keep up with a heavy update load.
- It is also possible to set vacuum_defer_cleanup_age> on the primary
- to defer the cleanup of records by autovacuum, VACUUM>
- and HOT. This might allow
- more time for queries to execute before they are cancelled on the standby,
- without the need for setting a high max_standby_delay>.
+ Users should be clear that tables that are regularly and heavily updated
+ on the primary server will quickly cause cancellation of longer running
+ queries on the standby. In such cases the setting of a finite value for
+ max_standby_archive_delay> or
+ max_standby_streaming_delay> can be considered similar to
+ setting statement_timeout>.
- Three-way deadlocks are possible between AccessExclusiveLocks> arriving from
- the primary, cleanup WAL records that require buffer cleanup locks, and
- user requests that are waiting behind replayed AccessExclusiveLocks>.
- Deadlocks are resolved automatically after deadlock_timeout>
- seconds, though they are thought to be rare in practice.
+ Remedial possibilities exist if the number of standby-query cancellations
+ is found to be unacceptable. The first option is to connect to the
+ primary server and keep a query active for as long as needed to
+ run queries on the standby. This prevents VACUUM> from removing
+ recently-dead rows and so cleanup conflicts do not occur.
+ This could be done using contrib/dblink> and
+ pg_sleep()>, or via other mechanisms. If you do this, you
+ should note that this will delay cleanup of dead rows on the primary,
+ which may result in undesirable table bloat. However, the cleanup
+ situation will be no worse than if the standby queries were running
+ directly on the primary server, and you are still getting the benefit of
+ off-loading execution onto the standby.
+ max_standby_archive_delay> must be kept large in this case,
+ because delayed WAL files might already contain entries that conflict with
+ the desired standby queries.
- Dropping tablespaces or databases is discussed in the administrator's
- section since they are not typical user situations.
+ Another option is to increase
+ on the primary server, so that dead rows will not be cleaned up as quickly
+ as they normally would be. This will allow more time for queries to
+ execute before they are cancelled on the standby, without having to set
+ a high max_standby_streaming_delay>. However it is
+ difficult to guarantee any specific execution-time window with this
+ approach, since vacuum_defer_cleanup_age> is measured in
+ transactions executed on the primary server.
- It is important that the administrator consider the appropriate setting
- of max_standby_delay>,
- which can be set in postgresql.conf>.
- There is no optimal setting, so it should be set according to business
- priorities. For example if the server is primarily tasked as a High
- Availability server, then you may wish to lower
- max_standby_delay> or even set it to zero, though that is a
- very aggressive setting. If the standby server is tasked as an additional
- server for decision support queries then it might be acceptable to set this
- to a value of many hours. It is also possible to set
- max_standby_delay> to -1 which means wait forever for queries
- to complete; this will be useful when performing
- an archive recovery from a backup.
+ It is important that the administrator select appropriate settings for
+
and
+ linkend="guc-max-standby-streaming-delay">. The best choices vary
+ depending on business priorities. For example if the server is primarily
+ tasked as a High Availability server, then you will want low delay
+ settings, perhaps even zero, though that is a very aggressive setting. If
+ the standby server is tasked as an additional server for decision support
+ queries then it might be acceptable to set the maximum delay values to
+ many hours, or even -1 which means wait forever for queries to complete.
- Running DROP DATABASE>, ALTER DATABASE ... SET TABLESPACE>,
- or ALTER DATABASE ... RENAME> on primary will generate a log message
- that will cause all users connected to that database on the standby to be
- forcibly disconnected. This action occurs immediately, whatever the setting of
- max_standby_delay>.
+ Running DROP DATABASE>, ALTER DATABASE ... SET
+ TABLESPACE>, or ALTER DATABASE ... RENAME> on the primary
+ will generate a WAL entry that will cause all users connected to that
+ database on the standby to be forcibly disconnected. This action occurs
+ immediately, whatever the setting of
+ max_standby_streaming_delay>.
- Autovacuum is not active during recovery, it will start normally at the
+ Autovacuum is not active during recovery. It will start normally at the
end of recovery.
Various parameters have been mentioned above in
- admin">
- and ">.
+ conflict"> and
+ ">.
On the primary, parameters and
can be used.
- has no effect if set on the primary.
+ and
+ have no effect if set on
+ the primary.
- On the standby, parameters and
- can be used.
- has no effect during
- recovery.
+ On the standby, parameters ,
+ and
+ can be used.
+ has no effect
+ as long as the server remains in standby mode, though it will
+ become relevant if the standby becomes primary.
* Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
- * $PostgreSQL: pgsql/src/backend/access/transam/xlog.c,v 1.427 2010/06/28 19:46:19 rhaas Exp $
+ * $PostgreSQL: pgsql/src/backend/access/transam/xlog.c,v 1.428 2010/07/03 20:43:57 tgl Exp $
*
*-------------------------------------------------------------------------
*/
bool XLogArchiveMode = false;
char *XLogArchiveCommand = NULL;
bool EnableHotStandby = false;
-int MaxStandbyDelay = 30 * 1000;
bool fullPageWrites = true;
bool log_checkpoints = false;
int sync_method = DEFAULT_SYNC_METHOD;
*/
static XLogwrtResult LogwrtResult = {{0, 0}, {0, 0}};
+/*
+ * Codes indicating where we got a WAL file from during recovery, or where
+ * to attempt to get one. These are chosen so that they can be OR'd together
+ * in a bitmask state variable.
+ */
+#define XLOG_FROM_ARCHIVE (1<<0) /* Restored using restore_command */
+#define XLOG_FROM_PG_XLOG (1<<1) /* Existing file in pg_xlog */
+#define XLOG_FROM_STREAM (1<<2) /* Streamed from master */
+
/*
* openLogFile is -1 or a kernel FD for an open log file segment.
* When it's open, openLogOff is the current seek offset in the file.
static uint32 openLogSeg = 0;
static uint32 openLogOff = 0;
-/*
- * Codes indicating where we got a WAL file from during recovery, or where
- * to attempt to get one.
- */
-#define XLOG_FROM_ARCHIVE (1<<0) /* Restored using restore_command */
-#define XLOG_FROM_PG_XLOG (1<<1) /* Existing file in pg_xlog */
-#define XLOG_FROM_STREAM (1<<2) /* Streamed from master */
-
/*
* These variables are used similarly to the ones above, but for reading
* the XLOG. Note, however, that readOff generally represents the offset
* Keeps track of which sources we've tried to read the current WAL
* record from and failed.
*/
-static int failedSources = 0;
+static int failedSources = 0; /* OR of XLOG_FROM_* codes */
+
+/*
+ * These variables track when we last obtained some WAL data to process,
+ * and where we got it from. (XLogReceiptSource is initially the same as
+ * readSource, but readSource gets reset to zero when we don't have data
+ * to process right now.)
+ */
+static TimestampTz XLogReceiptTime = 0;
+static int XLogReceiptSource = 0; /* XLOG_FROM_* code */
/* Buffer for currently read page (XLOG_BLCKSZ bytes) */
static char *readBuf = NULL;
* Open a logfile segment for reading (during recovery).
*
* If source = XLOG_FROM_ARCHIVE, the segment is retrieved from archive.
- * If source = XLOG_FROM_PG_XLOG, it's read from pg_xlog.
+ * Otherwise, it's assumed to be already available in pg_xlog.
*/
static int
XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
break;
case XLOG_FROM_PG_XLOG:
+ case XLOG_FROM_STREAM:
XLogFilePath(path, tli, log, seg);
restoredFromArchive = false;
break;
xlogfname);
set_ps_display(activitymsg, false);
+ /* Track source of data in assorted state variables */
readSource = source;
+ XLogReceiptSource = source;
+ /* In FROM_STREAM case, caller tracks receipt time, not me */
+ if (source != XLOG_FROM_STREAM)
+ XLogReceiptTime = GetCurrentTimestamp();
+
return fd;
}
if (errno != ENOENT || !notfoundOk) /* unexpected failure? */
/*
* Returns timestamp of last recovered commit/abort record.
*/
-TimestampTz
+static TimestampTz
GetLatestXLogTime(void)
{
/* use volatile pointer to prevent code rearrangement */
return recoveryLastXTime;
}
+/*
+ * Returns time of receipt of current chunk of XLOG data, as well as
+ * whether it was received from streaming replication or from archives.
+ */
+void
+GetXLogReceiptTime(TimestampTz *rtime, bool *fromStream)
+{
+ /*
+ * This must be executed in the startup process, since we don't export
+ * the relevant state to shared memory.
+ */
+ Assert(InRecovery);
+
+ *rtime = XLogReceiptTime;
+ *fromStream = (XLogReceiptSource == XLOG_FROM_STREAM);
+}
+
/*
* Note that text field supplied is a parameter name and does not require
* translation
xlogctl->recoveryLastRecPtr = ReadRecPtr;
SpinLockRelease(&xlogctl->info_lck);
+ /* Also ensure XLogReceiptTime has a sane value */
+ XLogReceiptTime = GetCurrentTimestamp();
+
/*
* Let postmaster know we've started redo now, so that it can
* launch bgwriter to perform restartpoints. We don't bother
XLogRecPtr endptr;
/* Get the current (or recent) end of xlog */
- endptr = GetWalRcvWriteRecPtr();
+ endptr = GetWalRcvWriteRecPtr(NULL);
PrevLogSeg(_logId, _logSeg);
RemoveOldXlogFiles(_logId, _logSeg, endptr);
XLogRecPtr recptr;
char location[MAXFNAMELEN];
- recptr = GetWalRcvWriteRecPtr();
+ recptr = GetWalRcvWriteRecPtr(NULL);
if (recptr.xlogid == 0 && recptr.xrecoff == 0)
PG_RETURN_NULL();
{
if (WalRcvInProgress())
{
+ bool havedata;
+
/*
* If we find an invalid record in the WAL streamed from
* master, something is seriously wrong. There's little
}
/*
- * While walreceiver is active, wait for new WAL to arrive
- * from primary.
+ * Walreceiver is active, so see if new data has arrived.
+ *
+ * We only advance XLogReceiptTime when we obtain fresh
+ * WAL from walreceiver and observe that we had already
+ * processed everything before the most recent "chunk"
+ * that it flushed to disk. In steady state where we are
+ * keeping up with the incoming data, XLogReceiptTime
+ * will be updated on each cycle. When we are behind,
+ * XLogReceiptTime will not advance, so the grace time
+ * alloted to conflicting queries will decrease.
*/
- receivedUpto = GetWalRcvWriteRecPtr();
if (XLByteLT(*RecPtr, receivedUpto))
+ havedata = true;
+ else
+ {
+ XLogRecPtr latestChunkStart;
+
+ receivedUpto = GetWalRcvWriteRecPtr(&latestChunkStart);
+ if (XLByteLT(*RecPtr, receivedUpto))
+ {
+ havedata = true;
+ if (!XLByteLT(*RecPtr, latestChunkStart))
+ XLogReceiptTime = GetCurrentTimestamp();
+ }
+ else
+ havedata = false;
+ }
+ if (havedata)
{
/*
* Great, streamed far enough. Open the file if it's
- * not open already.
+ * not open already. Use XLOG_FROM_STREAM so that
+ * source info is set correctly and XLogReceiptTime
+ * isn't changed.
*/
if (readFile < 0)
{
readFile =
XLogFileRead(readId, readSeg, PANIC,
recoveryTargetTLI,
- XLOG_FROM_PG_XLOG, false);
+ XLOG_FROM_STREAM, false);
+ Assert(readFile >= 0);
switched_segment = true;
+ }
+ else
+ {
+ /* just make sure source info is correct... */
readSource = XLOG_FROM_STREAM;
+ XLogReceiptSource = XLOG_FROM_STREAM;
}
break;
}
+ /*
+ * Data not here yet, so check for trigger then sleep.
+ */
if (CheckForStandbyTrigger())
goto triggered;
readFile = XLogFileReadAnyTLI(readId, readSeg, DEBUG2,
sources);
switched_segment = true;
- if (readFile != -1)
+ if (readFile >= 0)
break;
/*
*
*
* IDENTIFICATION
- * $PostgreSQL: pgsql/src/backend/replication/walreceiver.c,v 1.14 2010/06/09 15:04:07 heikki Exp $
+ * $PostgreSQL: pgsql/src/backend/replication/walreceiver.c,v 1.15 2010/07/03 20:43:57 tgl Exp $
*
*-------------------------------------------------------------------------
*/
/* Update shared-memory status */
SpinLockAcquire(&walrcv->mutex);
+ walrcv->latestChunkStart = walrcv->receivedUpto;
walrcv->receivedUpto = LogstreamResult.Flush;
SpinLockRelease(&walrcv->mutex);
*
*
* IDENTIFICATION
- * $PostgreSQL: pgsql/src/backend/replication/walreceiverfuncs.c,v 1.5 2010/04/28 16:54:15 tgl Exp $
+ * $PostgreSQL: pgsql/src/backend/replication/walreceiverfuncs.c,v 1.6 2010/07/03 20:43:57 tgl Exp $
*
*-------------------------------------------------------------------------
*/
if (recptr.xrecoff % XLogSegSize != 0)
recptr.xrecoff -= recptr.xrecoff % XLogSegSize;
+ SpinLockAcquire(&walrcv->mutex);
+
/* It better be stopped before we try to restart it */
Assert(walrcv->walRcvState == WALRCV_STOPPED);
- SpinLockAcquire(&walrcv->mutex);
if (conninfo != NULL)
strlcpy((char *) walrcv->conninfo, conninfo, MAXCONNINFO);
else
walrcv->startTime = now;
walrcv->receivedUpto = recptr;
+ walrcv->latestChunkStart = recptr;
+
SpinLockRelease(&walrcv->mutex);
SendPostmasterSignal(PMSIGNAL_START_WALRECEIVER);
}
/*
- * Returns the byte position that walreceiver has written
+ * Returns the last+1 byte position that walreceiver has written.
+ *
+ * Optionally, returns the previous chunk start, that is the first byte
+ * written in the most recent walreceiver flush cycle. Callers not
+ * interested in that value may pass NULL for latestChunkStart.
*/
XLogRecPtr
-GetWalRcvWriteRecPtr(void)
+GetWalRcvWriteRecPtr(XLogRecPtr *latestChunkStart)
{
/* use volatile pointer to prevent code rearrangement */
volatile WalRcvData *walrcv = WalRcv;
SpinLockAcquire(&walrcv->mutex);
recptr = walrcv->receivedUpto;
+ if (latestChunkStart)
+ *latestChunkStart = walrcv->latestChunkStart;
SpinLockRelease(&walrcv->mutex);
return recptr;
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
- * $PostgreSQL: pgsql/src/backend/storage/ipc/standby.c,v 1.25 2010/06/14 00:49:24 itagaki Exp $
+ * $PostgreSQL: pgsql/src/backend/storage/ipc/standby.c,v 1.26 2010/07/03 20:43:58 tgl Exp $
*
*-------------------------------------------------------------------------
*/
#include "storage/standby.h"
#include "utils/ps_status.h"
+/* User-settable GUC parameters */
int vacuum_defer_cleanup_age;
+int max_standby_archive_delay = 30 * 1000;
+int max_standby_streaming_delay = 30 * 1000;
static List *RecoveryLockList;
static void LogCurrentRunningXacts(RunningTransactions CurrRunningXacts);
static void LogAccessExclusiveLocks(int nlocks, xl_standby_lock *locks);
+
/*
* InitRecoveryTransactionEnvironment
- * Initiallize tracking of in-progress transactions in master
+ * Initialize tracking of in-progress transactions in master
*
* We need to issue shared invalidations and hold locks. Holding locks
- * means others may want to wait on us, so we need to make lock table
- * inserts to appear like a transaction. We could create and delete
+ * means others may want to wait on us, so we need to make a lock table
+ * vxact entry like a real transaction. We could create and delete
* lock table entries for each transaction but its simpler just to create
* one permanent entry and leave it there all the time. Locks are then
* acquired and released as needed. Yes, this means you can see the
VirtualTransactionId vxid;
/*
- * Initialise shared invalidation management for Startup process, being
+ * Initialize shared invalidation management for Startup process, being
* careful to register ourselves as a sendOnly process so we don't need to
* read messages, nor will we get signalled when the queue starts filling
* up.
* -----------------------------------------------------
*/
+/*
+ * Determine the cutoff time at which we want to start canceling conflicting
+ * transactions. Returns zero (a time safely in the past) if we are willing
+ * to wait forever.
+ */
+static TimestampTz
+GetStandbyLimitTime(void)
+{
+ TimestampTz rtime;
+ bool fromStream;
+
+ /*
+ * The cutoff time is the last WAL data receipt time plus the appropriate
+ * delay variable. Delay of -1 means wait forever.
+ */
+ GetXLogReceiptTime(&rtime, &fromStream);
+ if (fromStream)
+ {
+ if (max_standby_streaming_delay < 0)
+ return 0; /* wait forever */
+ return TimestampTzPlusMilliseconds(rtime, max_standby_streaming_delay);
+ }
+ else
+ {
+ if (max_standby_archive_delay < 0)
+ return 0; /* wait forever */
+ return TimestampTzPlusMilliseconds(rtime, max_standby_archive_delay);
+ }
+}
+
#define STANDBY_INITIAL_WAIT_US 1000
static int standbyWait_us = STANDBY_INITIAL_WAIT_US;
static bool
WaitExceedsMaxStandbyDelay(void)
{
- /* Are we past max_standby_delay? */
- if (MaxStandbyDelay >= 0 &&
- TimestampDifferenceExceeds(GetLatestXLogTime(), GetCurrentTimestamp(),
- MaxStandbyDelay))
+ TimestampTz ltime;
+
+ /* Are we past the limit time? */
+ ltime = GetStandbyLimitTime();
+ if (ltime && GetCurrentTimestamp() >= ltime)
return true;
/*
pid = CancelVirtualTransaction(*waitlist, reason);
/*
- * Wait awhile for it to die so that we avoid flooding an
- * unresponsive backend when system is heavily loaded.
+ * Wait a little bit for it to die so that we avoid flooding
+ * an unresponsive backend when system is heavily loaded.
*/
if (pid != 0)
pg_usleep(5000L);
ResolveRecoveryConflictWithDatabase(Oid dbid)
{
/*
- * We don't do ResolveRecoveryConflictWithVirutalXIDs() here since that
+ * We don't do ResolveRecoveryConflictWithVirtualXIDs() here since that
* only waits for transactions and completely idle sessions would block
* us. This is rare enough that we do this as simply as possible: no wait,
* just force them off immediately.
* the limit of our patience. The sleep in LockBufferForCleanup() is
* performed here, for code clarity.
*
- * Resolve conflict by sending a SIGUSR1 reason to all backends to check if
+ * Resolve conflicts by sending a PROCSIG signal to all backends to check if
* they hold one of the buffer pins that is blocking Startup process. If so,
* backends will take an appropriate error action, ERROR or FATAL.
*
- * We also check for deadlocks before we wait, though applications that cause
- * these will be extremely rare. Deadlocks occur because if queries
+ * We also must check for deadlocks. Deadlocks occur because if queries
* wait on a lock, that must be behind an AccessExclusiveLock, which can only
* be cleared if the Startup process replays a transaction completion record.
* If Startup process is also waiting then that is a deadlock. The deadlock
* Startup is sleeping and the query waits on a lock. We protect against
* only the former sequence here, the latter sequence is checked prior to
* the query sleeping, in CheckRecoveryConflictDeadlock().
+ *
+ * Deadlocks are extremely rare, and relatively expensive to check for,
+ * so we don't do a deadlock check right away ... only if we have had to wait
+ * at least deadlock_timeout. Most of the logic about that is in proc.c.
*/
void
ResolveRecoveryConflictWithBufferPin(void)
{
bool sig_alarm_enabled = false;
+ TimestampTz ltime;
+ TimestampTz now;
Assert(InHotStandby);
- if (MaxStandbyDelay == 0)
- {
- /*
- * We don't want to wait, so just tell everybody holding the pin to
- * get out of town.
- */
- SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
- }
- else if (MaxStandbyDelay < 0)
- {
- TimestampTz now = GetCurrentTimestamp();
+ ltime = GetStandbyLimitTime();
+ now = GetCurrentTimestamp();
+ if (!ltime)
+ {
/*
- * Set timeout for deadlock check (only)
+ * We're willing to wait forever for conflicts, so set timeout for
+ * deadlock check (only)
*/
if (enable_standby_sig_alarm(now, now, true))
sig_alarm_enabled = true;
else
elog(FATAL, "could not set timer for process wakeup");
}
+ else if (now >= ltime)
+ {
+ /*
+ * We're already behind, so clear a path as quickly as possible.
+ */
+ SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
+ }
else
{
- TimestampTz then = GetLatestXLogTime();
- TimestampTz now = GetCurrentTimestamp();
-
- /* Are we past max_standby_delay? */
- if (TimestampDifferenceExceeds(then, now, MaxStandbyDelay))
- {
- /*
- * We're already behind, so clear a path as quickly as possible.
- */
- SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
- }
+ /*
+ * Wake up at ltime, and check for deadlocks as well if we will be
+ * waiting longer than deadlock_timeout
+ */
+ if (enable_standby_sig_alarm(now, ltime, false))
+ sig_alarm_enabled = true;
else
- {
- TimestampTz max_standby_time;
-
- /*
- * At what point in the future do we hit MaxStandbyDelay?
- */
- max_standby_time = TimestampTzPlusMilliseconds(then, MaxStandbyDelay);
- Assert(max_standby_time > now);
-
- /*
- * Wake up at MaxStandby delay, and check for deadlocks as well
- * if we will be waiting longer than deadlock_timeout
- */
- if (enable_standby_sig_alarm(now, max_standby_time, false))
- sig_alarm_enabled = true;
- else
- elog(FATAL, "could not set timer for process wakeup");
- }
+ elog(FATAL, "could not set timer for process wakeup");
}
/* Wait to be signaled by UnpinBuffer() */
*
*
* IDENTIFICATION
- * $PostgreSQL: pgsql/src/backend/storage/lmgr/proc.c,v 1.219 2010/05/26 19:52:52 sriggs Exp $
+ * $PostgreSQL: pgsql/src/backend/storage/lmgr/proc.c,v 1.220 2010/07/03 20:43:58 tgl Exp $
*
*-------------------------------------------------------------------------
*/
bool
enable_standby_sig_alarm(TimestampTz now, TimestampTz fin_time, bool deadlock_only)
{
- TimestampTz deadlock_time = TimestampTzPlusMilliseconds(now, DeadlockTimeout);
+ TimestampTz deadlock_time = TimestampTzPlusMilliseconds(now,
+ DeadlockTimeout);
if (deadlock_only)
{
/*
- * Wake up at DeadlockTimeout only, then wait forever
+ * Wake up at deadlock_time only, then wait forever
*/
statement_fin_time = deadlock_time;
deadlock_timeout_active = true;
else if (fin_time > deadlock_time)
{
/*
- * Wake up at DeadlockTimeout, then again at MaxStandbyDelay
+ * Wake up at deadlock_time, then again at fin_time
*/
statement_fin_time = deadlock_time;
statement_fin_time2 = fin_time;
else
{
/*
- * Wake only at MaxStandbyDelay because its fairly soon
+ * Wake only at fin_time because its fairly soon
*/
statement_fin_time = fin_time;
deadlock_timeout_active = false;
if (deadlock_timeout_active)
{
/*
- * We're still waiting when we reach DeadlockTimeout, so send out a request
- * to have other backends check themselves for deadlock. Then continue
- * waiting until MaxStandbyDelay.
+ * We're still waiting when we reach deadlock timeout, so send out
+ * a request to have other backends check themselves for
+ * deadlock. Then continue waiting until statement_fin_time,
+ * if that's set.
*/
SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
deadlock_timeout_active = false;
/*
- * Begin second waiting period to MaxStandbyDelay if required.
+ * Begin second waiting period if required.
*/
if (statement_timeout_active)
{
else
{
/*
- * We've now reached MaxStandbyDelay, so ask all conflicts to leave, cos
- * its time for us to press ahead with applying changes in recovery.
+ * We've now reached statement_fin_time, so ask all conflicts to
+ * leave, so we can press ahead with applying changes in recovery.
*/
SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
}
* Written by Peter Eisentraut
.
*
* IDENTIFICATION
- * $PostgreSQL: pgsql/src/backend/utils/misc/guc.c,v 1.557 2010/06/25 13:11:25 sriggs Exp $
+ * $PostgreSQL: pgsql/src/backend/utils/misc/guc.c,v 1.558 2010/07/03 20:43:58 tgl Exp $
*
*--------------------------------------------------------------------
*/
#include "postmaster/walwriter.h"
#include "replication/walsender.h"
#include "storage/bufmgr.h"
+#include "storage/standby.h"
#include "storage/fd.h"
#include "tcop/tcopprot.h"
#include "tsearch/ts_cache.h"
extern char *temp_tablespaces;
extern bool synchronize_seqscans;
extern bool fullPageWrites;
-extern int vacuum_defer_cleanup_age;
extern int ssl_renegotiation_limit;
#ifdef TRACE_SORT
1000, 1, INT_MAX / 1000, NULL, NULL
},
+ {
+ {"max_standby_archive_delay", PGC_SIGHUP, WAL_STANDBY_SERVERS,
+ gettext_noop("Sets the maximum delay before canceling queries when a hot standby server is processing archived WAL data."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &max_standby_archive_delay,
+ 30 * 1000, -1, INT_MAX / 1000, NULL, NULL
+ },
+
+ {
+ {"max_standby_streaming_delay", PGC_SIGHUP, WAL_STANDBY_SERVERS,
+ gettext_noop("Sets the maximum delay before canceling queries when a hot standby server is processing streamed WAL data."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &max_standby_streaming_delay,
+ 30 * 1000, -1, INT_MAX / 1000, NULL, NULL
+ },
+
/*
* Note: MaxBackends is limited to INT_MAX/4 because some places compute
* 4*MaxBackends without any overflow check. This check is made in
100, 1, INT_MAX / 4, assign_maxconnections, NULL
},
- {
- {"max_standby_delay", PGC_SIGHUP, WAL_STANDBY_SERVERS,
- gettext_noop("Sets the maximum delay to avoid conflict processing on hot standby servers."),
- NULL,
- GUC_UNIT_MS
- },
- &MaxStandbyDelay,
- 30 * 1000, -1, INT_MAX / 1000, NULL, NULL
- },
-
{
{"superuser_reserved_connections", PGC_POSTMASTER, CONN_AUTH_SETTINGS,
gettext_noop("Sets the number of connection slots reserved for superusers."),
# - Streaming Replication -
#max_wal_senders = 0 # max number of walsender processes
-#wal_sender_delay = 200ms # 1-10000 milliseconds
+#wal_sender_delay = 200ms # walsender cycle time, 1-10000 milliseconds
#wal_keep_segments = 0 # in logfile segments, 16MB each; 0 disables
# - Standby Servers -
-#hot_standby = off # allows queries during recovery
-#max_standby_delay = 30s # max acceptable lag to allow queries to
- # complete without conflict; -1 means forever
-#vacuum_defer_cleanup_age = 0 # num transactions by which cleanup is deferred
+#hot_standby = off # "on" allows queries during recovery
+#max_standby_archive_delay = 30s # max delay before canceling queries
+ # when reading WAL from archive;
+ # -1 allows indefinite delay
+#max_standby_streaming_delay = 30s # max delay before canceling queries
+ # when reading streaming WAL;
+ # -1 allows indefinite delay
+#vacuum_defer_cleanup_age = 0 # number of transactions by which cleanup is deferred
#------------------------------------------------------------------------------
* Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
- * $PostgreSQL: pgsql/src/include/access/xlog.h,v 1.113 2010/06/17 16:41:25 tgl Exp $
+ * $PostgreSQL: pgsql/src/include/access/xlog.h,v 1.114 2010/07/03 20:43:58 tgl Exp $
*/
#ifndef XLOG_H
#define XLOG_H
extern PGDLLIMPORT TimeLineID ThisTimeLineID; /* current TLI */
/*
- * Prior to 8.4, all activity during recovery was carried out by Startup
+ * Prior to 8.4, all activity during recovery was carried out by the startup
* process. This local variable continues to be used in many parts of the
- * code to indicate actions taken by RecoveryManagers. Other processes who
- * potentially perform work during recovery should check RecoveryInProgress()
- * see XLogCtl notes in xlog.c
+ * code to indicate actions taken by RecoveryManagers. Other processes that
+ * potentially perform work during recovery should check RecoveryInProgress().
+ * See XLogCtl notes in xlog.c.
*/
extern bool InRecovery;
/*
* Like InRecovery, standbyState is only valid in the startup process.
+ * In all other processes it will have the value STANDBY_DISABLED (so
+ * InHotStandby will read as FALSE).
*
* In DISABLED state, we're performing crash recovery or hot standby was
* disabled in recovery.conf.
*
- * In INITIALIZED state, we haven't yet received a RUNNING_XACTS or shutdown
- * checkpoint record to initialize our master transaction tracking system.
+ * In INITIALIZED state, we've run InitRecoveryTransactionEnvironment, but
+ * we haven't yet processed a RUNNING_XACTS or shutdown-checkpoint WAL record
+ * to initialize our master-transaction tracking system.
*
* When the transaction tracking is initialized, we enter the SNAPSHOT_PENDING
* state. The tracked information might still be incomplete, so we can't allow
STANDBY_SNAPSHOT_PENDING,
STANDBY_SNAPSHOT_READY
} HotStandbyState;
+
extern HotStandbyState standbyState;
#define InHotStandby (standbyState >= STANDBY_SNAPSHOT_PENDING)
extern bool XLogArchiveMode;
extern char *XLogArchiveCommand;
extern bool EnableHotStandby;
-extern int MaxStandbyDelay;
extern bool log_checkpoints;
/* WAL levels */
extern bool RecoveryInProgress(void);
extern bool XLogInsertAllowed(void);
-extern TimestampTz GetLatestXLogTime(void);
+extern void GetXLogReceiptTime(TimestampTz *rtime, bool *fromStream);
extern void UpdateControlFile(void);
extern uint64 GetSystemIdentifier(void);
*
* Portions Copyright (c) 2010-2010, PostgreSQL Global Development Group
*
- * $PostgreSQL: pgsql/src/include/replication/walreceiver.h,v 1.9 2010/06/03 22:17:32 tgl Exp $
+ * $PostgreSQL: pgsql/src/include/replication/walreceiver.h,v 1.10 2010/07/03 20:43:58 tgl Exp $
*
*-------------------------------------------------------------------------
*/
typedef struct
{
/*
- * connection string; is used for walreceiver to connect with the primary.
- */
- char conninfo[MAXCONNINFO];
-
- /*
- * PID of currently active walreceiver process, and the current state.
+ * PID of currently active walreceiver process, its current state and
+ * start time (actually, the time at which it was requested to be started).
*/
pid_t pid;
WalRcvState walRcvState;
pg_time_t startTime;
/*
- * receivedUpto-1 is the last byte position that has been already
- * received. When startup process starts the walreceiver, it sets this to
- * the point where it wants the streaming to begin. After that,
- * walreceiver updates this whenever it flushes the received WAL.
+ * receivedUpto-1 is the last byte position that has already been
+ * received. When startup process starts the walreceiver, it sets
+ * receivedUpto to the point where it wants the streaming to begin.
+ * After that, walreceiver updates this whenever it flushes the received
+ * WAL to disk.
*/
XLogRecPtr receivedUpto;
+ /*
+ * latestChunkStart is the starting byte position of the current "batch"
+ * of received WAL. It's actually the same as the previous value of
+ * receivedUpto before the last flush to disk. Startup process can use
+ * this to detect whether it's keeping up or not.
+ */
+ XLogRecPtr latestChunkStart;
+
+ /*
+ * connection string; is used for walreceiver to connect with the primary.
+ */
+ char conninfo[MAXCONNINFO];
+
slock_t mutex; /* locks shared variables shown above */
} WalRcvData;
extern bool WalRcvInProgress(void);
extern XLogRecPtr WaitNextXLogAvailable(XLogRecPtr recptr, bool *finished);
extern void RequestXLogStreaming(XLogRecPtr recptr, const char *conninfo);
-extern XLogRecPtr GetWalRcvWriteRecPtr(void);
+extern XLogRecPtr GetWalRcvWriteRecPtr(XLogRecPtr *latestChunkStart);
#endif /* _WALRECEIVER_H */
* Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
- * $PostgreSQL: pgsql/src/include/storage/standby.h,v 1.10 2010/05/13 11:15:38 sriggs Exp $
+ * $PostgreSQL: pgsql/src/include/storage/standby.h,v 1.11 2010/07/03 20:43:58 tgl Exp $
*
*-------------------------------------------------------------------------
*/
#include "storage/procsignal.h"
#include "storage/relfilenode.h"
+/* User-settable GUC parameters */
extern int vacuum_defer_cleanup_age;
+extern int max_standby_archive_delay;
+extern int max_standby_streaming_delay;
extern void InitRecoveryTransactionEnvironment(void);
extern void ShutdownRecoveryTransactionEnvironment(void);
/*
* Declarations for GetRunningTransactionData(). Similar to Snapshots, but
* not quite. This has nothing at all to do with visibility on this server,
- * so this is completely separate from snapmgr.c and snapmgr.h
+ * so this is completely separate from snapmgr.c and snapmgr.h.
* This data is important for creating the initial snapshot state on a
* standby server. We need lots more information than a normal snapshot,
* hence we use a specific data structure for our needs. This data