--- /dev/null
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA05273
+ for
; Fri, 3 Nov 2000 17:50:39 -0500 (EST)
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eA3Mm1s26018;
+ Fri, 3 Nov 2000 17:48:01 -0500 (EST)
+Received: from sss.pgh.pa.us (sss.pgh.pa.us [209.114.132.154])
+ by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id eA3Mles25919
+ for
; Fri, 3 Nov 2000 17:47:40 -0500 (EST)
+Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
+ by sss.pgh.pa.us (8.11.1/8.11.1) with ESMTP id eA3Mle508385
+ for
; Fri, 3 Nov 2000 17:47:40 -0500 (EST)
+Subject: [HACKERS] Transaction ID wraparound: problem and proposed solution
+Date: Fri, 03 Nov 2000 17:47:40 -0500
+From: Tom Lane
+Precedence: bulk
+Status: ORr
+
+We've expended a lot of worry and discussion in the past about what
+happens if the OID generator wraps around. However, there is another
+4-byte counter in the system: the transaction ID (XID) generator.
+While OID wraparound is survivable, if XIDs wrap around then we really
+do have a Ragnarok scenario. The tuple validity checks do ordered
+comparisons on XIDs, and will consider tuples with xmin > current xact
+to be invalid. Result: after wraparound, your whole database would
+instantly vanish from view.
+
+The first thought that comes to mind is that XIDs should be promoted to
+eight bytes. However there are several practical problems with this:
+* portability --- I don't believe long long int exists on all the
+platforms we support.
+* performance --- except on true 64-bit platforms, widening Datum to
+eight bytes would be a system-wide performance hit, which is a tad
+unpleasant to fix a scenario that's not yet been reported from the
+field.
+* disk space --- letting pg_log grow without bound isn't a pleasant
+prospect either.
+
+I believe it is possible to fix these problems without widening XID,
+by redefining XIDs in a way that allows for wraparound. Here's my
+plan:
+
+1. Allow XIDs to range from 0 to WRAPLIMIT-1 (WRAPLIMIT is not
+necessarily 4G, see discussion below). Ordered comparisons on XIDs
+are no longer simply "x < y", but need to be expressed as a macro.
+We consider x < y if (y - x) % WRAPLIMIT < WRAPLIMIT/2.
+This comparison will work as long as the range of interesting XIDs
+never exceeds WRAPLIMIT/2. Essentially, we envision the actual value
+of XID as being the low-order bits of a logical XID that always
+increases, and we assume that no extant XID is more than WRAPLIMIT/2
+transactions old, so we needn't keep track of the high-order bits.
+
+2. To keep the system from having to deal with XIDs that are more than
+WRAPLIMIT/2 transactions old, VACUUM should "freeze" known-good old
+tuples. To do this, we'll reserve a special XID, say 1, that is always
+considered committed and is always less than any ordinary XID. (So the
+ordered-comparison macro is really a little more complicated than I said
+above. Note that there is already a reserved XID just like this in the
+system, the "bootstrap" XID. We could simply use the bootstrap XID, but
+it seems better to make another one.) When VACUUM finds a tuple that
+is committed good and has xmin < XmaxRecent (the oldest XID that might
+be considered uncommitted by any open transaction), it will replace that
+tuple's xmin by the special always-good XID. Therefore, as long as
+VACUUM is run on all tables in the installation more often than once per
+WRAPLIMIT/2 transactions, there will be no tuples with ordinary XIDs
+older than WRAPLIMIT/2.
+
+3. At wraparound, the XID counter has to be advanced to skip over the
+InvalidXID value (zero) and the reserved XIDs, so that no real transaction
+is generated with those XIDs. No biggie here.
+
+4. With the wraparound behavior, pg_log will have a bounded size: it
+will never exceed WRAPLIMIT*2 bits = WRAPLIMIT/4 bytes. Since we will
+recycle pg_log entries every WRAPLIMIT xacts, during transaction start
+the xact manager will have to take care to actively clear its pg_log
+entry to zeroes (I'm not sure if it does that already, or just assumes
+that new pg_log entries will start out zero). As long as that happens
+before the xact makes any data changes, it's OK to recycle the entry.
+Note we are assuming that no tuples will remain in the database with
+xmin or xmax equal to that XID from a prior cycle of the universe.
+
+This scheme allows us to survive XID wraparound at the cost of slight
+additional complexity in ordered comparisons of XIDs (which is not a
+really performance-critical task AFAIK), and at the cost that the
+original insertion XIDs of all but recent tuples will be lost by
+VACUUM. The system doesn't particularly care about that, but old XIDs
+do sometimes come in handy for debugging purposes. A possible
+compromise is to overwrite only XIDs that are older than, say,
+WRAPLIMIT/4 instead of doing so as soon as possible. This would mean
+the required VACUUM frequency is every WRAPLIMIT/4 xacts instead of
+every WRAPLIMIT/2 xacts.
+
+We have a straightforward tradeoff between the maximum size of pg_log
+(WRAPLIMIT/4 bytes) and the required frequency of VACUUM (at least
+every WRAPLIMIT/2 or WRAPLIMIT/4 transactions). This could be made
+configurable in config.h for those who're intent on customization,
+but I'd be inclined to set the default value at WRAPLIMIT = 1G.
+
+Comments? Vadim, is any of this about to be superseded by WAL?
+If not, I'd like to fix it for 7.1.
+
+ regards, tom lane
+
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA08863
+ for
; Fri, 3 Nov 2000 20:20:31 -0500 (EST)
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eA41Jgs31567;
+ Fri, 3 Nov 2000 20:19:42 -0500 (EST)
+Received: from thor.tht.net (thor.tht.net [209.47.145.4])
+ by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id eA41CMs31023
+ for
; Fri, 3 Nov 2000 20:12:22 -0500 (EST)
+Received: from sss.pgh.pa.us (sss.pgh.pa.us [209.114.132.154])
+ by thor.tht.net (8.9.3/8.9.3) with ESMTP id VAA14928
+ for
; Fri, 3 Nov 2000 21:13:08 GMT
+Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
+ by sss.pgh.pa.us (8.11.1/8.11.1) with ESMTP id eA41CK508777;
+ Fri, 3 Nov 2000 20:12:21 -0500 (EST)
+To: "Mikheev, Vadim"
+Subject: Re: [HACKERS] Transaction ID wraparound: problem and proposed solution
+In-reply-to: <8F4C99C66D04D4118F580090272A7A234D3146@sectorbase1.sectorbase.com>
+References: <8F4C99C66D04D4118F580090272A7A234D3146@sectorbase1.sectorbase.com>
+Comments: In-reply-to "Mikheev, Vadim"
+ message dated "Fri, 03 Nov 2000 16:24:38 -0800"
+Date: Fri, 03 Nov 2000 20:12:20 -0500
+From: Tom Lane
+Precedence: bulk
+Status: OR
+
+"Mikheev, Vadim" writes:
+> So, we'll have to abort some long running transaction.
+
+Well, yes, some transaction that continues running while ~ 500 million
+other transactions come and go might give us trouble. I wasn't really
+planning to worry about that case ;-)
+
+> Required frequency of *successful* vacuum over *all* tables.
+> We would have to remember something in pg_class/pg_database
+> and somehow force vacuum over "too-long-unvacuumed-tables"
+> *automatically*.
+
+I don't think this is a problem now; in practice you couldn't possibly
+go for half a billion transactions without vacuuming, I'd think.
+
+If your plans to eliminate regular vacuuming become reality, then this
+scheme might become less reliable, but at present I think there's plenty
+of safety margin.
+
+> If undo would be implemented then we could delete pg_log between
+> postmaster startups - startup counter is remembered in pages, so
+> seeing old startup id in a page we would know that there are only
+> long ago committed xactions (ie only visible changes) there
+> and avoid xid comparison. But ... there will be no undo in 7.1.
+> And I foresee problems with WAL based BAR implementation if we'll
+> follow proposed solution: redo restores original xmin/xmax - how
+> to "freeze" xids while restoring DB?
+
+So, we might eventually have a better answer from WAL, but not for 7.1.
+
+I think my idea is reasonably non-invasive and could be removed without
+much trouble once WAL offers a better way. I'd really like to have some
+answer for 7.1, though. The sort of numbers John Scott was quoting to
+me for Verizon's paging network throughput make it clear that we aren't
+going to survive at that level with a limit of 4G transactions per
+database reload. Having to vacuum everything on at least a
+1G-transaction cycle is salable, dump/initdb/reload is not ...
+
+ regards, tom lane
+
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA12038
+ for
; Fri, 3 Nov 2000 21:30:13 -0500 (EST)
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eA42TQs33780;
+ Fri, 3 Nov 2000 21:29:26 -0500 (EST)
+Received: from sss.pgh.pa.us (sss.pgh.pa.us [209.114.132.154])
+ by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id eA42TCs33632
+ for
; Fri, 3 Nov 2000 21:29:12 -0500 (EST)
+Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
+ by sss.pgh.pa.us (8.11.1/8.11.1) with ESMTP id eA42T5509042;
+ Fri, 3 Nov 2000 21:29:05 -0500 (EST)
+Subject: Re: [HACKERS] Transaction ID wraparound: problem and proposed solution
+Comments: In-reply-to Philip Warner
+ message dated "Sat, 04 Nov 2000 13:09:22 +1100"
+Date: Fri, 03 Nov 2000 21:29:04 -0500
+From: Tom Lane
+Precedence: bulk
+Status: OR
+
+>> * disk space --- letting pg_log grow without bound isn't a pleasant
+>> prospect either.
+
+> Maybe this can be achieved by wrapping XID for the log file only.
+
+How's that going to improve matters? pg_log is ground truth for XIDs;
+if you can't distinguish two XIDs in pg_log, there's no point in
+distinguishing them elsewhere.
+
+> Maybe I'm really missing the amount of XID manipulation, but I'd be
+> surprised if 16-byte XIDs would slow things down much.
+
+It's not so much XIDs themselves, as that I think we'd need to widen
+typedef Datum too, and that affects manipulations of *all* data types.
+
+In any case, the prospect of a multi-gigabyte, ever-growing pg_log file,
+with no way to recover the space short of dump/initdb/reload, is
+awfully unappetizing for a high-traffic installation...
+
+ regards, tom lane
+
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA13035
+ for
; Fri, 3 Nov 2000 21:42:29 -0500 (EST)
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eA42fjs40619;
+ Fri, 3 Nov 2000 21:41:45 -0500 (EST)
+Received: from hse-toronto-ppp119263.sympatico.ca (HSE-Toronto-ppp85465.sympatico.ca [216.209.18.18])
+ by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eA42fXs40530
+ for
; Fri, 3 Nov 2000 21:41:33 -0500 (EST)
+Received: (qmail 66996 invoked by uid 0); 4 Nov 2000 02:46:34 -0000
+ by hse-toronto-ppp85465.sympatico.ca with SMTP; 4 Nov 2000 02:46:34 -0000
+Date: Fri, 03 Nov 2000 21:41:29 -0500
+From: Rod Taylor
+Organization: Zort
+X-Mailer: Mozilla 4.75 [en] (X11; U; FreeBSD 4.1.1-STABLE i386)
+X-Accept-Language: en
+MIME-Version: 1.0
+To: Tom Lane
+Subject: Re: [HACKERS] Transaction ID wraparound: problem and proposed solution
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Precedence: bulk
+Status: OR
+
+Tom Lane wrote:
+>
+> >> * disk space --- letting pg_log grow without bound isn't a pleasant
+> >> prospect either.
+>
+> > Maybe this can be achieved by wrapping XID for the log file only.
+>
+> How's that going to improve matters? pg_log is ground truth for XIDs;
+> if you can't distinguish two XIDs in pg_log, there's no point in
+> distinguishing them elsewhere.
+>
+> > Maybe I'm really missing the amount of XID manipulation, but I'd be
+> > surprised if 16-byte XIDs would slow things down much.
+>
+> It's not so much XIDs themselves, as that I think we'd need to widen
+> typedef Datum too, and that affects manipulations of *all* data types.
+>
+> In any case, the prospect of a multi-gigabyte, ever-growing pg_log file,
+> with no way to recover the space short of dump/initdb/reload, is
+> awfully unappetizing for a high-traffic installation...
+
+Agreed completely. I'd like to think I could have such an installation
+in the next year or so :)
+
+To prevent a performance hit to those who don't want, is there a
+possibility of either a compile time option or 'auto-expanding' the
+width of the XID's and other items when it becomes appropriate? Start
+with int4, when that limit is hit goto int8, and should -- quite
+unbelievibly so but there are multi-TB databases -- it be necessary jump
+to int12 or int16? Be the first to support Exa-objects in an RDBMS.
+Testing not necessary ;)
+
+Compiletime option would be appropriate however if theres a significant
+performance hit.
+
+I'm not much of a c coder (obviously), so I don't know of the
+limitations. plpgsql is my friend that can do nearly anything :)
+
+Hmm... After reading the above I should have stuck with lurking.
+
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id BAA29566
+ for
; Sun, 5 Nov 2000 01:07:07 -0500 (EST)
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eA564Ks60463;
+ Sun, 5 Nov 2000 01:04:20 -0500 (EST)
+Received: from gate1.sectorbase.com ([208.48.122.134])
+ by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eA55sas57106
+ for
; Sun, 5 Nov 2000 00:54:36 -0500 (EST)
+Received: from dune (unknown [208.48.122.182])
+ by gate1.sectorbase.com (Postfix) with SMTP
+ id 170DB2E806; Sat, 4 Nov 2000 21:53:56 -0800 (PST)
+From: "Vadim Mikheev"
+To: "Tom Lane"
+References: <8F4C99C66D04D4118F580090272A7A234D3146@sectorbase1.sectorbase.com> <
[email protected]>
+Subject: Re: [HACKERS] Transaction ID wraparound: problem and proposed solution
+Date: Sat, 4 Nov 2000 21:59:00 -0800
+MIME-Version: 1.0
+Content-Type: text/plain;
+ charset="windows-1251"
+Content-Transfer-Encoding: 7bit
+X-Priority: 3
+X-MSMail-Priority: Normal
+X-Mailer: Microsoft Outlook Express 5.50.4133.2400
+X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400
+Precedence: bulk
+Status: OR
+
+> > So, we'll have to abort some long running transaction.
+>
+> Well, yes, some transaction that continues running while ~ 500 million
+> other transactions come and go might give us trouble. I wasn't really
+> planning to worry about that case ;-)
+
+Agreed, I just don't like to rely on assumptions -:)
+
+> > Required frequency of *successful* vacuum over *all* tables.
+> > We would have to remember something in pg_class/pg_database
+> > and somehow force vacuum over "too-long-unvacuumed-tables"
+> > *automatically*.
+>
+> I don't think this is a problem now; in practice you couldn't possibly
+> go for half a billion transactions without vacuuming, I'd think.
+
+Why not?
+And once again - assumptions are not good for transaction area.
+
+> If your plans to eliminate regular vacuuming become reality, then this
+> scheme might become less reliable, but at present I think there's plenty
+> of safety margin.
+>
+> > If undo would be implemented then we could delete pg_log between
+> > postmaster startups - startup counter is remembered in pages, so
+> > seeing old startup id in a page we would know that there are only
+> > long ago committed xactions (ie only visible changes) there
+> > and avoid xid comparison. But ... there will be no undo in 7.1.
+> > And I foresee problems with WAL based BAR implementation if we'll
+> > follow proposed solution: redo restores original xmin/xmax - how
+> > to "freeze" xids while restoring DB?
+>
+> So, we might eventually have a better answer from WAL, but not for 7.1.
+> I think my idea is reasonably non-invasive and could be removed without
+> much trouble once WAL offers a better way. I'd really like to have some
+> answer for 7.1, though. The sort of numbers John Scott was quoting to
+> me for Verizon's paging network throughput make it clear that we aren't
+> going to survive at that level with a limit of 4G transactions per
+> database reload. Having to vacuum everything on at least a
+> 1G-transaction cycle is salable, dump/initdb/reload is not ...
+
+Understandable. And probably we can get BAR too but require full
+backup every WRAPLIMIT/2 (or better /4) transactions.
+
+Vadim
+
+
+
+Received: from gate1.sectorbase.com ([208.48.122.134])
+ by candle.pha.pa.us (8.9.0/8.9.0) with SMTP id DAA10570
+ for
; Sun, 5 Nov 2000 03:55:30 -0500 (EST)
+Received: from dune (unknown [208.48.122.185])
+ by gate1.sectorbase.com (Postfix) with SMTP
+ id 5033D2E806; Sun, 5 Nov 2000 00:54:22 -0800 (PST)
+From: "Vadim Mikheev"
+To: "Bruce Momjian"
, "Tom Lane"
+Subject: Re: [HACKERS] Transaction ID wraparound: problem and proposed solution
+Date: Sun, 5 Nov 2000 01:02:01 -0800
+MIME-Version: 1.0
+Content-Type: text/plain;
+ charset="iso-8859-1"
+Content-Transfer-Encoding: 7bit
+X-Priority: 3
+X-MSMail-Priority: Normal
+X-Mailer: Microsoft Outlook Express 5.50.4133.2400
+X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400
+Status: OR
+
+> One idea I had from this is actually truncating pg_log at some point if
+> we know all the tuples have the special committed xid. It would prevent
+> the file from growing without bounds.
+
+Not truncating, but implementing pg_log as set of files - we could remove
+files for old xids.
+
+> Vadim, can you explain how WAL will make pg_log unnecessary someday?
+
+First, I mentioned only that having undo we could remove old pg_log after
+postmaster startup because of only committed changes would be in data
+files and they would be visible to new transactions (small changes in tqual
+will be required to take page' startup id into account) which would reuse xids.
+While changing a page first time in current startup, server would do exactly
+what Tom is going to do at vacuuming - just update xmin/xmax to "1" in all items
+(or setting some flag in t_infomask), - and change page' startup id to current.
+
+I understand that this is not complete solution for xids problem, I just wasn't
+going to solve it that time. Now after Tom' proposal I see how to reuse xids
+without vacuuming (but having undo): we will add XidWrapId (XWI) - xid wrap
+counter - to pages and set it when we change page. First time we do this for
+page with old XWI we'll mark old items (to know later that they were changed
+by xids with old XWI). Each time we change page we can mark old xmin/xmax
+with xid <= current xid as committed long ago (basing on xact TTL restrinctions).
+
+All above assumes that there will be no xids from aborted transactions in pages,
+so we need not lookup in pg_log to know is a xid committed/aborted, - there will
+be only xids from running or committed xactions there.
+
+And we need in undo for this.
+
+Vadim
+
+
+
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA17110
+ for
; Tue, 7 Nov 2000 20:57:16 -0500 (EST)
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eA81vcs17073;
+ Tue, 7 Nov 2000 20:57:38 -0500 (EST)
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eA81kos15436
+ for
; Tue, 7 Nov 2000 20:46:50 -0500 (EST)
+Received: from me.tm.ee (adsl895.estpak.ee [213.168.23.133])
+ by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id eA5Esds15479
+ for
; Sun, 5 Nov 2000 09:54:40 -0500 (EST)
+ by me.tm.ee (8.9.3/8.9.3) with ESMTP id PAA01401;
+ Sun, 5 Nov 2000 15:48:14 +0200
+Date: Sun, 05 Nov 2000 15:48:13 +0200
+From: Hannu Krosing
+X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.17 i686)
+X-Accept-Language: en
+MIME-Version: 1.0
+To: Tom Lane
+Subject: Re: [HACKERS] Transaction ID wraparound: problem and proposed solution
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Precedence: bulk
+Status: OR
+
+Tom Lane wrote:
+>
+> >> * disk space --- letting pg_log grow without bound isn't a pleasant
+> >> prospect either.
+>
+> > Maybe this can be achieved by wrapping XID for the log file only.
+>
+> How's that going to improve matters? pg_log is ground truth for XIDs;
+> if you can't distinguish two XIDs in pg_log, there's no point in
+> distinguishing them elsewhere.
+
+One simple way - start a new pg_log file at each wraparound and encode
+the high 4 bytes in the filename (or in first four bytes of file)
+
+> > Maybe I'm really missing the amount of XID manipulation, but I'd be
+> > surprised if 16-byte XIDs would slow things down much.
+>
+> It's not so much XIDs themselves, as that I think we'd need to widen
+> typedef Datum too, and that affects manipulations of *all* data types.
+
+Do you mean that each _field_ will take more space, not each _record_ ?
+
+> In any case, the prospect of a multi-gigabyte, ever-growing pg_log file,
+> with no way to recover the space short of dump/initdb/reload, is
+> awfully unappetizing for a high-traffic installation...
+
+The pg_log should be rotated anyway either with long xids or long-long
+xids.
+
+-----------
+Hannu
+
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id QAA03570
+ for
; Sun, 5 Nov 2000 16:19:46 -0500 (EST)
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eA5LKbs64176;
+ Sun, 5 Nov 2000 16:20:37 -0500 (EST)
+Received: from me.tm.ee (adsl895.estpak.ee [213.168.23.133])
+ by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id eA5LKCs64044
+ for
; Sun, 5 Nov 2000 16:20:12 -0500 (EST)
+ by me.tm.ee (8.9.3/8.9.3) with ESMTP id WAA00997;
+ Sun, 5 Nov 2000 22:14:24 +0200
+Date: Sun, 05 Nov 2000 22:14:24 +0200
+From: Hannu Krosing
+X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.17 i686)
+X-Accept-Language: en
+MIME-Version: 1.0
+Subject: Re: [HACKERS] Transaction ID wraparound: problem and proposed solution
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Precedence: bulk
+Status: OR
+
+Peter Eisentraut wrote:
+>
+> Hannu Krosing writes:
+>
+> > > The first thought that comes to mind is that XIDs should be promoted to
+> > > eight bytes. However there are several practical problems with this:
+> > > * portability --- I don't believe long long int exists on all the
+> > > platforms we support.
+> >
+> > I suspect that gcc at least supports long long on all OS-s we support
+>
+> Uh, we don't want to depend on gcc, do we?
+
+I suspect that we do on many platforms (like *BSD, Linux and Win32).
+
+What platforms we currently support don't have functional gcc ?
+
+> But we could make the XID a struct of two 4-byte integers, at the obvious
+> increase in storage size.
+
+And a (hopefully) small performance hit on operations when defined as
+macros,
+and some more for less data fitting in cache.
+
+what operations do we need to be defined ?
+
+will >, <, ==, !=, >=, <== and ++ be enough ?
+
+-------------
+Hannu
+
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id MAA24746
+ for
; Mon, 6 Nov 2000 12:36:49 -0500 (EST)
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eA6HWqs14206;
+ Mon, 6 Nov 2000 12:32:52 -0500 (EST)
+Received: from granger.mail.mindspring.net (granger.mail.mindspring.net [207.69.200.148])
+ by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id eA6HT2s13718
+ for
; Mon, 6 Nov 2000 12:29:02 -0500 (EST)
+Received: from jupiter (user-2inikn4.dialup.mindspring.com [165.121.82.228])
+ by granger.mail.mindspring.net (8.9.3/8.8.5) with SMTP id MAA07826;
+ Mon, 6 Nov 2000 12:28:37 -0500 (EST)
+From: Mark Hollomon
+Date: Mon, 6 Nov 2000 13:09:19 -0500
+X-Mailer: KMail [version 1.1.99]
+Content-Type: text/plain;
+ charset="iso-8859-1"
+To: Tom Lane
+Subject: Re: [HACKERS] Transaction ID wraparound: problem and proposed solution
+MIME-Version: 1.0
+Message-Id: <00110613091900.00324@jupiter>
+Content-Transfer-Encoding: 8bit
+Precedence: bulk
+Status: OR
+
+On Sunday 05 November 2000 13:02, Tom Lane wrote:
+> OK, 2^64 isn't mathematically unbounded, but let's see you buy a disk
+> that will hold it ;-). My point is that if we want to think about
+> allowing >4G transactions, part of the answer has to be a way to recycle
+> pg_log space. Otherwise it's still not really practical.
+
+I kind of like vadim's idea of segmenting pg_log.
+
+Segments in which all the xacts have been commited could be deleted.
+
+--
+Mark Hollomon
+
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA23678
+ for
; Fri, 10 Nov 2000 15:06:06 -0500 (EST)
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eAAK5fs44672;
+ Fri, 10 Nov 2000 15:05:41 -0500 (EST)
+Received: from charybdis.zembu.com (charybdis.zembu.com [209.157.144.99])
+ by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eAAK30s44361
+ for
; Fri, 10 Nov 2000 15:03:01 -0500 (EST)
+Received: (qmail 15640 invoked from network); 10 Nov 2000 20:02:12 -0000
+Received: from store.z.zembu.com (192.168.1.142)
+ by charybdis.z.zembu.com with SMTP; 10 Nov 2000 20:02:12 -0000
+Received: from ncm by store.z.zembu.com with local (Exim 3.12 #1 (Debian))
+ id 13uKMX-0003rZ-00; Fri, 10 Nov 2000 12:01:25 -0800
+Date: Fri, 10 Nov 2000 12:01:25 -0800
+From: Nathan Myers
+Subject: Re: [HACKERS] Transaction ID wraparound: problem and proposed solution
+Mime-Version: 1.0
+Content-Type: text/plain; charset=us-ascii
+User-Agent: Mutt/1.0.1i
+Precedence: bulk
+Status: OR
+
+On Sun, Nov 05, 2000 at 03:48:13PM +0200, Hannu Krosing wrote:
+> Tom Lane wrote:
+> >
+> > Philip Warner
writes:
+> > >> * disk space --- letting pg_log grow without bound isn't a pleasant
+> > >> prospect either.
+> >
+> > > Maybe this can be achieved by wrapping XID for the log file only.
+> >
+> > How's that going to improve matters? pg_log is ground truth for XIDs;
+> > if you can't distinguish two XIDs in pg_log, there's no point in
+> > distinguishing them elsewhere.
+>
+> One simple way - start a new pg_log file at each wraparound and encode
+> the high 4 bytes in the filename (or in first four bytes of file)
+
+Proposal:
+
+Annotate each log file with the current XID value at the time the file
+is created. Before comparing any two XIDs, subtract that value from
+each operand, using unsigned arithmetic.
+
+At a sustained rate of 10,000 transactions/second, any pair of 32-bit
+XIDs less than 2.5 days apart compare properly.
+
+Nathan Myers
+
+
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA08743
+ for
; Fri, 3 Nov 2000 20:17:35 -0500 (EST)
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eA415Hs30899;
+ Fri, 3 Nov 2000 20:05:22 -0500 (EST)
+Received: from thor.tht.net (thor.tht.net [209.47.145.4])
+ by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id eA40dns30224
+ for
; Fri, 3 Nov 2000 19:39:49 -0500 (EST)
+Received: from sectorbase2.sectorbase.com ([208.48.122.131])
+ by thor.tht.net (8.9.3/8.9.3) with SMTP id UAA14292
+ for
; Fri, 3 Nov 2000 20:40:31 GMT
+Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2650.21)
+ id ; Fri, 3 Nov 2000 16:20:43 -0800
+Message-ID: <8F4C99C66D04D4118F580090272A7A234D3146@sectorbase1.sectorbase.com>
+From: "Mikheev, Vadim"
+Subject: RE: [HACKERS] Transaction ID wraparound: problem and proposed sol
+ ution
+Date: Fri, 3 Nov 2000 16:24:38 -0800
+MIME-Version: 1.0
+X-Mailer: Internet Mail Service (5.5.2650.21)
+Content-Type: text/plain;
+ charset="iso-8859-1"
+Precedence: bulk
+Status: OR
+
+> This comparison will work as long as the range of interesting XIDs
+> never exceeds WRAPLIMIT/2. Essentially, we envision the actual value
+> of XID as being the low-order bits of a logical XID that always
+> increases, and we assume that no extant XID is more than WRAPLIMIT/2
+> transactions old, so we needn't keep track of the high-order bits.
+
+So, we'll have to abort some long running transaction.
+And before after-wrap XIDs will be close to aborted xid you'd better
+ensure that vacuum *successfully* run over all tables in database
+(and shared tables) aborted transaction could touch.
+
+> This scheme allows us to survive XID wraparound at the cost of slight
+> additional complexity in ordered comparisons of XIDs (which is not a
+> really performance-critical task AFAIK), and at the cost that the
+> original insertion XIDs of all but recent tuples will be lost by
+> VACUUM. The system doesn't particularly care about that, but old XIDs
+> do sometimes come in handy for debugging purposes. A possible
+
+I wouldn't care about this.
+
+> compromise is to overwrite only XIDs that are older than, say,
+> WRAPLIMIT/4 instead of doing so as soon as possible. This would mean
+> the required VACUUM frequency is every WRAPLIMIT/4 xacts instead of
+> every WRAPLIMIT/2 xacts.
+>
+> We have a straightforward tradeoff between the maximum size of pg_log
+> (WRAPLIMIT/4 bytes) and the required frequency of VACUUM (at least
+
+Required frequency of *successful* vacuum over *all* tables.
+We would have to remember something in pg_class/pg_database
+and somehow force vacuum over "too-long-unvacuumed-tables"
+*automatically*.
+
+> every WRAPLIMIT/2 or WRAPLIMIT/4 transactions). This could be made
+> configurable in config.h for those who're intent on customization,
+> but I'd be inclined to set the default value at WRAPLIMIT = 1G.
+>
+> Comments? Vadim, is any of this about to be superseded by WAL?
+> If not, I'd like to fix it for 7.1.
+
+If undo would be implemented then we could delete pg_log between
+postmaster startups - startup counter is remembered in pages, so
+seeing old startup id in a page we would know that there are only
+long ago committed xactions (ie only visible changes) there
+and avoid xid comparison. But ... there will be no undo in 7.1.
+And I foresee problems with WAL based BAR implementation if we'll
+follow proposed solution: redo restores original xmin/xmax - how
+to "freeze" xids while restoring DB?
+
+(Sorry, I have to run away now... and have to think more about issue).
+
+Vadim
+
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA06780
+ for
; Mon, 6 Nov 2000 17:29:49 -0500 (EST)
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eA6MSus41571;
+ Mon, 6 Nov 2000 17:28:56 -0500 (EST)
+Received: from sectorbase2.sectorbase.com ([208.48.122.131])
+ by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eA6MPUs41171
+ for
; Mon, 6 Nov 2000 17:25:30 -0500 (EST)
+Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2650.21)
+ id ; Mon, 6 Nov 2000 14:08:12 -0800
+Message-ID: <8F4C99C66D04D4118F580090272A7A234D314A@sectorbase1.sectorbase.com>
+From: "Mikheev, Vadim"
+ Tom Lane
+
+Subject: RE: [HACKERS] Transaction ID wraparound: problem and proposed sol
+ ution
+Date: Mon, 6 Nov 2000 14:12:07 -0800
+MIME-Version: 1.0
+X-Mailer: Internet Mail Service (5.5.2650.21)
+Content-Type: text/plain;
+ charset="iso-8859-1"
+Precedence: bulk
+Status: OR
+
+> > OK, 2^64 isn't mathematically unbounded, but let's see you
+> > buy a disk that will hold it ;-). My point is that if we want
+> > to think about allowing >4G transactions, part of the answer
+> > has to be a way to recycle pg_log space. Otherwise it's still
+> > not really practical.
+>
+> I kind of like vadim's idea of segmenting pg_log.
+>
+> Segments in which all the xacts have been commited could be deleted.
+
+Without undo we have to ensure that all tables are vacuumed after
+all transactions related to a segment were committed/aborted.
+
+Vadim
+
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA10173
+ for
; Fri, 3 Nov 2000 21:10:59 -0500 (EST)
+Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
+ by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eA42A7s33061;
+ Fri, 3 Nov 2000 21:10:07 -0500 (EST)
+Received: from acheron.rime.com.au (albatr.lnk.telstra.net [139.130.54.222])
+ by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id eA429Ss32948
+ for
; Fri, 3 Nov 2000 21:09:28 -0500 (EST)
+Received: from oberon (Oberon.rime.com.au [203.8.195.100])
+ by acheron.rime.com.au (8.9.3/8.9.3) with SMTP id NAA13631;
+ Sat, 4 Nov 2000 13:08:54 +1100
+X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.5 (32)
+Date: Sat, 04 Nov 2000 13:09:22 +1100
+Subject: Re: [HACKERS] Transaction ID wraparound: problem and proposed
+ solution
+Mime-Version: 1.0
+Content-Type: text/plain; charset="us-ascii"
+Precedence: bulk
+Status: OR
+
+At 17:47 3/11/00 -0500, Tom Lane wrote:
+>* portability --- I don't believe long long int exists on all the
+>platforms we support.
+
+Are you sure of this, or is it just a 'last time I looked' statement. If
+the latter, it might be worth verifying.
+
+
+>* performance --- except on true 64-bit platforms, widening Datum to
+>eight bytes would be a system-wide performance hit,
+
+Yes, OIDs are used a lot, but it's not that bad, is it? Are there many
+tight loops with thousands of OID-only operations? I'd guess it's only one
+more instruction & memory fetch.
+
+
+>* disk space --- letting pg_log grow without bound isn't a pleasant
+>prospect either.
+
+Maybe this can be achieved by wrapping XID for the log file only.
+
+
+>I believe it is possible to fix these problems without widening XID,
+>by redefining XIDs in a way that allows for wraparound. Here's my
+>plan:
+
+It's a cute idea (elegant, even), but maybe we'd be running through hoops
+just for a minor performance gain (which may not exist, since we're adding
+extra comparisons via the macro) and for possible unsupported OSs. Perhaps
+OS's without 8 byte ints have to suffer a performance hit (ie. we declare a
+struct with appropriate macros).
+
+
+>are no longer simply "x < y", but need to be expressed as a macro.
+>We consider x < y if (y - x) % WRAPLIMIT < WRAPLIMIT/2.
+
+You mean you plan to limit PGSQL to only 1G concurrent transactions. Isn't
+that a bit short sighted? ;-}
+
+
+>2. To keep the system from having to deal with XIDs that are more than
+>WRAPLIMIT/2 transactions old, VACUUM should "freeze" known-good old
+>tuples.
+
+This is a problem for me; it seems to enshrine VACUUM in perpetuity.
+
+
+>4. With the wraparound behavior, pg_log will have a bounded size: it
+>will never exceed WRAPLIMIT*2 bits = WRAPLIMIT/4 bytes. Since we will
+>recycle pg_log entries every WRAPLIMIT xacts, during transaction start
+
+Is there any was we can use this recycling technique with 8-byte XIDs?
+
+Also, will there be a problem with backup programs that use XID to
+determine newer records and apply/reapply changes?
+
+
+>This scheme allows us to survive XID wraparound at the cost of slight
+>additional complexity in ordered comparisons of XIDs (which is not a
+>really performance-critical task AFAIK)
+
+Maybe I'm really missing the amount of XID manipulation, but I'd be
+surprised if 16-byte XIDs would slow things down much.
+
+
+----------------------------------------------------------------
+Philip Warner | __---_____
+Albatross Consulting Pty. Ltd. |----/ - \
+(A.B.N. 75 008 659 498) | /(@) ______---_
+Tel: (+61) 0500 83 82 81 | _________ \
+Fax: (+61) 0500 83 82 82 | ___________ |
+Http://www.rhyme.com.au | / \|
+ | --________--
+PGP key available upon request, | /
+and from pgp5.ai.mit.edu:11371 |/
+