URGENT
-* -Add OUTER joins, left and right[outer] (Tom, Thomas)
+* -Add OUTER joins, left and right (Tom, Thomas)
* -Allow long tuples by chaining or auto-storing outside db (TOAST) (Jan)
* -Fix memory leak for expressions (Tom)
* Add replication of distributed databases [replication]
o -Allow large object vacuuming
o -Tables that start with xinv confused to be large objects
* Add IPv6 capability to INET/CIDR types
-* -Fix improper masking of some inet/cidr types [cidr]
+* -Fix improper masking of some inet/cidr types
* Add conversion function from text to inet
* Make a separate SERIAL type?
* Store binary-compatible type information in the system
* Add the concept of dataspaces/tablespaces [tablespaces]
* Allow queries across multiple databases
* Allow nested transactions (Vadim)
-* Allow [INSERT/UPDATE] ... RETURNING new.col or old.col (Philip)
+* Allow INSERT/UPDATE ... RETURNING new.col or old.col (Philip)
* SQL*Net listener that makes PostgreSQL appear as an Oracle database
to clients
* Incremental backups
* Allow cursors to be DECLAREd/OPENed/CLOSEed outside transactions
* Allow DELETE WHERE CURRENT OF cursor
* -Transaction log, so re-do log can be on a separate disk by
- with after-row images (Vadim) [logging]
+ with after-row images (Vadim)
* Populate backend status area and write program to dump status data
* Make oid use unsigned int more reliably, pg_atoi()
* Put sort files in their own directory
* Allow autocommit so always in a transaction block
* Show location of syntax error in query [yacc]
-* -Redesign the function call interface to handle NULLs better[function] (Tom)
+* -Redesign the function call interface to handle NULLs better (Tom)
* Missing optimizer selectivities for date, r-tree, etc. [optimizer]
* Overhaul bufmgr/lockmgr/transaction manager
* -redesign UNION structures to have separarate target lists
+++ /dev/null
- by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id UAA15611
- for ; Wed, 22 Sep 1999 20:31:01 -0400 (EDT)
-Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id UAA02926 for ; Wed, 22 Sep 1999 20:21:24 -0400 (EDT)
-Received: from hub.org (hub.org [216.126.84.1])
- by hub.org (8.9.3/8.9.3) with ESMTP id UAA75413;
- Wed, 22 Sep 1999 20:09:35 -0400 (EDT)
-Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 22 Sep 1999 20:08:50 +0000 (EDT)
-Received: (from majordom@localhost)
- by hub.org (8.9.3/8.9.3) id UAA75058
- for pgsql-hackers-outgoing; Wed, 22 Sep 1999 20:06:58 -0400 (EDT)
-Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
- by hub.org (8.9.3/8.9.3) with ESMTP id UAA74982
- for
; Wed, 22 Sep 1999 20:06:25 -0400 (EDT)
-Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
- by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id UAA06411
- for
; Wed, 22 Sep 1999 20:05:40 -0400 (EDT)
-Subject: [HACKERS] Progress report: buffer refcount bugs and SQL functions
-Date: Wed, 22 Sep 1999 20:05:39 -0400
-From: Tom Lane
-Precedence: bulk
-Status: RO
-
-I have been finding a lot of interesting stuff while looking into
-the buffer reference count/leakage issue.
-
-It turns out that there were two specific things that were camouflaging
-the existence of bugs in this area:
-
-1. The BufferLeakCheck routine that's run at transaction commit was
-only looking for nonzero PrivateRefCount to indicate a missing unpin.
-It failed to notice nonzero LastRefCount --- which meant that an
-error in refcount save/restore usage could leave a buffer pinned,
-and BufferLeakCheck wouldn't notice.
-
-2. The BufferIsValid macro, which you'd think just checks whether
-it's handed a valid buffer identifier or not, actually did more:
-it only returned true if the buffer ID was valid *and* the buffer
-had positive PrivateRefCount. That meant that the common pattern
- if (BufferIsValid(buf))
- ReleaseBuffer(buf);
-wouldn't complain if it were handed a valid but already unpinned buffer.
-And that behavior masks bugs that result in buffers being unpinned too
-early. For example, consider a sequence like
-
-1. LockBuffer (buffer now has refcount 1). Store reference to
- a tuple on that buffer page in a tuple table slot.
-2. Copy buffer reference to a second tuple-table slot, but forget to
- increment buffer's refcount.
-3. Release second tuple table slot. Buffer refcount drops to 0,
- so it's unpinned.
-4. Release original tuple slot. Because of BufferIsValid behavior,
- no assert happens here; in fact nothing at all happens.
-
-This is, of course, buggy code: during the interval from 3 to 4 you
-still have an apparently valid tuple reference in the original slot,
-which someone might try to use; but the buffer it points to is unpinned
-and could be replaced at any time by another backend.
-
-In short, we had errors that would mask both missing-pin bugs and
-missing-unpin bugs. And naturally there were a few such bugs lurking
-behind them...
-
-3. The buffer refcount save/restore stuff, which I had suspected
-was useless, is not only useless but also buggy. The reason it's
-buggy is that it only works if used in a nested fashion. You could
-save state A, pin some buffers, save state B, pin some more
-buffers, restore state B (thereby unpinning what you pinned since
-the save), and finally restore state A (unpinning the earlier stuff).
-What you could not do is save state A, pin, save B, pin more, then
-restore state A --- that might unpin some of A's buffers, or some
-of B's buffers, or some unforeseen combination thereof. If you
-restore A and then restore B, you do not necessarily return to a zero-
-pins state, either. And it turns out the actual usage pattern was a
-nearly random sequence of saves and restores, compounded by a failure to
-do all of the restores reliably (which was masked by the oversight in
-BufferLeakCheck).
-
-
-What I have done so far is to rip out the buffer refcount save/restore
-support (including LastRefCount), change BufferIsValid to a simple
-validity check (so that you get an assert if you unpin something that
-was pinned), change ExecStoreTuple so that it increments the refcount
-when it is handed a buffer reference (for symmetry with ExecClearTuple's
-decrement of the refcount), and fix about a dozen bugs exposed by these
-changes.
-
-I am still getting Buffer Leak notices in the "misc" regression test,
-specifically in the queries that invoke more than one SQL function.
-What I find there is that SQL functions are not always run to
-completion. Apparently, when a function can return multiple tuples,
-it won't necessarily be asked to produce them all. And when it isn't,
-postquel_end() isn't invoked for the function's current query, so its
-tuple table isn't cleared, so we have dangling refcounts if any of the
-tuples involved are in disk buffers.
-
-It may be that the save/restore code was a misguided attempt to fix
-this problem. I can't tell. But I think what we really need to do is
-find some way of ensuring that Postquel function execution contexts
-always get shut down by the end of the query, so that they don't leak
-resources.
-
-I suppose a straightforward approach would be to keep a list of open
-function contexts somewhere (attached to the outer execution context,
-perhaps), and clean them up at outer-plan shutdown.
-
-What I am wondering, though, is whether this addition is actually
-necessary, or is it a bug that the functions aren't run to completion
-in the first place? I don't really understand the semantics of this
-"nested dot notation". I suppose it is a Berkeleyism; I can't find
-anything about it in the SQL92 document. The test cases shown in the
-misc regress test seem peculiar, not to say wrong. For example:
-
-regression=> SELECT p.hobbies.equipment.name, p.hobbies.name, p.name FROM person p;
-name |name |name
--------------+-----------+-----
-advil |posthacking|mike
-peet's coffee|basketball |joe
-hightops |basketball |sally
-(3 rows)
-
-which doesn't appear to agree with the contents of the underlying
-relations:
-
-regression=> SELECT * FROM hobbies_r;
-name |person
------------+------
-posthacking|mike
-posthacking|jeff
-basketball |joe
-basketball |sally
-skywalking |
-(5 rows)
-
-regression=> SELECT * FROM equipment_r;
-name |hobby
--------------+-----------
-advil |posthacking
-peet's coffee|posthacking
-hightops |basketball
-guts |skywalking
-(4 rows)
-
-I'd have expected an output along the lines of
-
-advil |posthacking|mike
-peet's coffee|posthacking|mike
-hightops |basketball |joe
-hightops |basketball |sally
-
-Is the regression test's expected output wrong, or am I misunderstanding
-what this query is supposed to do? Is there any documentation anywhere
-about how SQL functions returning multiple tuples are supposed to
-behave?
-
- regards, tom lane
-
-************
-
-
-Received: from hub.org (hub.org [216.126.84.1])
- by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA16211
- for ; Thu, 23 Sep 1999 11:03:17 -0400 (EDT)
-Received: from hub.org (hub.org [216.126.84.1])
- by hub.org (8.9.3/8.9.3) with ESMTP id KAA58151;
- Thu, 23 Sep 1999 10:53:46 -0400 (EDT)
-Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 23 Sep 1999 10:53:05 +0000 (EDT)
-Received: (from majordom@localhost)
- by hub.org (8.9.3/8.9.3) id KAA57948
- for pgsql-hackers-outgoing; Thu, 23 Sep 1999 10:52:23 -0400 (EDT)
-Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
- by hub.org (8.9.3/8.9.3) with ESMTP id KAA57841
- for ; Thu, 23 Sep 1999 10:51:50 -0400 (EDT)
-Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
- by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id KAA14211;
- Thu, 23 Sep 1999 10:51:10 -0400 (EDT)
-Subject: Re: [HACKERS] Progress report: buffer refcount bugs and SQL functions
-In-reply-to: Your message of Thu, 23 Sep 1999 10:07:24 +0200
-Date: Thu, 23 Sep 1999 10:51:10 -0400
-From: Tom Lane
-Precedence: bulk
-Status: RO
-
-Andreas Zeugswetter
writes:
-> That is what I use it for. I have never used it with a
-> returns setof function, but reading the comments in the regression test,
-> -- mike needs advil and peet's coffee,
-> -- joe and sally need hightops, and
-> -- everyone else is fine.
-> it looks like the results you expected are correct, and currently the
-> wrong result is given.
-
-Yes, I have concluded the same (and partially fixed it, per my previous
-message).
-
-> Those that don't have a hobbie should return name|NULL|NULL. A hobbie
-> that does'nt need equipment name|hobbie|NULL.
-
-That's a good point. Currently (both with and without my uncommitted
-fix) you get *no* rows out from ExecTargetList if there are any Iters
-that return empty result sets. It might be more reasonable to treat an
-empty result set as if it were NULL, which would give the behavior you
-suggest.
-
-This would be an easy change to my current patch, and I'm prepared to
-make it before committing what I have, if people agree that that's a
-more reasonable definition. Comments?
-
- regards, tom lane
-
-************
-
-
- by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id EAA11344
- for ; Thu, 23 Sep 1999 04:31:15 -0400 (EDT)
-Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id EAA05350 for ; Thu, 23 Sep 1999 04:24:29 -0400 (EDT)
-Received: from hub.org (hub.org [216.126.84.1])
- by hub.org (8.9.3/8.9.3) with ESMTP id EAA85679;
- Thu, 23 Sep 1999 04:16:26 -0400 (EDT)
-Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 23 Sep 1999 04:09:52 +0000 (EDT)
-Received: (from majordom@localhost)
- by hub.org (8.9.3/8.9.3) id EAA84708
- for pgsql-hackers-outgoing; Thu, 23 Sep 1999 04:08:57 -0400 (EDT)
-Received: from gandalf.telecom.at (gandalf.telecom.at [194.118.26.84])
- by hub.org (8.9.3/8.9.3) with ESMTP id EAA84632
- for ; Thu, 23 Sep 1999 04:08:03 -0400 (EDT)
-Received: from telecom.at (w0188000580.f000.d0188.sd.spardat.at [172.18.65.249])
- by gandalf.telecom.at (xxx/xxx) with ESMTP id KAA195294
- for ; Thu, 23 Sep 1999 10:07:27 +0200
-Date: Thu, 23 Sep 1999 10:07:24 +0200
-From: Andreas Zeugswetter
-X-Mailer: Mozilla 4.61 [en] (Win95; I)
-X-Accept-Language: en
-MIME-Version: 1.0
-Subject: Re: [HACKERS] Progress report: buffer refcount bugs and SQL functions
-Content-Type: text/plain; charset=us-ascii
-Content-Transfer-Encoding: 7bit
-Precedence: bulk
-Status: RO
-
-> Is the regression test's expected output wrong, or am I
-> misunderstanding
-> what this query is supposed to do? Is there any
-> documentation anywhere
-> about how SQL functions returning multiple tuples are supposed to
-> behave?
-
-They are supposed to behave somewhat like a view.
-Not all rows are necessarily fetched.
-If used in a context that needs a single row answer,
-and the answer has multiple rows it is supposed to
-runtime elog. Like in:
-
-select * from tbl where col=funcreturningmultipleresults();
--- this must elog
-
-while this is ok:
-select * from tbl where col in (select funcreturningmultipleresults());
-
-But the caller could only fetch the first row if he wanted.
-
-The nested notation is supposed to call the function passing it the tuple
-as the first argument. This is what can be used to "fake" a column
-onto a table (computed column).
-That is what I use it for. I have never used it with a
-returns setof function, but reading the comments in the regression test,
--- mike needs advil and peet's coffee,
--- joe and sally need hightops, and
--- everyone else is fine.
-it looks like the results you expected are correct, and currently the
-wrong result is given.
-
-But I think this query could also elog whithout removing substantial
-functionality.
-
-SELECT p.name, p.hobbies.name, p.hobbies.equipment.name FROM person p;
-
-Actually for me it would be intuitive, that this query return one row per
-person, but elog on those that have more than one hobbie or a hobbie that
-needs more than one equipment. Those that don't have a hobbie should
-return name|NULL|NULL. A hobbie that does'nt need equipment name|hobbie|NULL.
-
-Andreas
-
-************
-
-
- by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA16360
- for ; Wed, 22 Sep 1999 22:01:05 -0400 (EDT)
-Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id VAA08386 for ; Wed, 22 Sep 1999 21:37:24 -0400 (EDT)
-Received: from hub.org (hub.org [216.126.84.1])
- by hub.org (8.9.3/8.9.3) with ESMTP id VAA88083;
- Wed, 22 Sep 1999 21:28:11 -0400 (EDT)
-Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 22 Sep 1999 21:27:48 +0000 (EDT)
-Received: (from majordom@localhost)
- by hub.org (8.9.3/8.9.3) id VAA87938
- for pgsql-hackers-outgoing; Wed, 22 Sep 1999 21:26:52 -0400 (EDT)
-Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8])
- by hub.org (8.9.3/8.9.3) with SMTP id VAA87909
- for
; Wed, 22 Sep 1999 21:26:36 -0400 (EDT)
-Received: by orion.SAPserv.Hamburg.dsh.de
- id m11TxXw-0003kLC; Thu, 23 Sep 99 03:19 MET DST
-Message-Id:
-Subject: Re: [HACKERS] Progress report: buffer refcount bugs and SQL functions
-Date: Thu, 23 Sep 1999 03:19:39 +0200 (MET DST)
-X-Mailer: ELM [version 2.4 PL25]
-Content-Type: text
-Precedence: bulk
-Status: RO
-
-Tom Lane wrote:
-
-> [...]
->
-> What I am wondering, though, is whether this addition is actually
-> necessary, or is it a bug that the functions aren't run to completion
-> in the first place? I don't really understand the semantics of this
-> "nested dot notation". I suppose it is a Berkeleyism; I can't find
-> anything about it in the SQL92 document. The test cases shown in the
-> misc regress test seem peculiar, not to say wrong. For example:
->
-> [...]
->
-> Is the regression test's expected output wrong, or am I misunderstanding
-> what this query is supposed to do? Is there any documentation anywhere
-> about how SQL functions returning multiple tuples are supposed to
-> behave?
-
- I've said some time (maybe too long) ago, that SQL functions
- returning tuple sets are broken in general. This nested dot
- notation (which I think is an artefact from the postquel
- querylanguage) is implemented via set functions.
-
- Set functions have total different semantics from all other
- functions. First they don't really return a tuple set as
- someone might think - all that screwed up code instead
- simulates that they return something you could consider a
- scan of the last SQL statement in the function. Then, on
- each subsequent call inside of the same command, they return
- a "tupletable slot" containing the next found tuple (that's
- why their Func node is mangled up after the first call).
-
- Second they have a targetlist what I think was originally
- intended to extract attributes out of the tuples returned
- when the above scan is asked to get the next tuple. But as I
- read the code it invokes the function again and this might
- cause the resource leakage you see.
-
- Third, all this seems to never have been implemented
- (thought?) to the end. A targetlist doesn't make sense at
- this place because it could at max contain a single attribute
- - so a single attno would have the same power. And if set
- functions could appear in the rangetable (FROM clause), than
- they would be treated as that and regular Var nodes in the
- query would do it.
-
- I think you shouldn't really care for that regression test
- and maybe we should disable set functions until we really
- implement stored procedures returning sets in the rangetable.
-
- Set functions where planned by Stonebraker's team as
- something that today is called stored procedures. But AFAIK
- they never reached the useful state because even in Postgres
- 4.2 you haven't been able to get more than one attribute out
- of a set function. It was a feature of the postquel
- querylanguage that you could get one attribute from a set
- function via
-
- RETRIEVE (attributename(setfuncname()))
-
- While working on the constraint triggers I've came across
- another regression test (triggers :-) that's errorneous too.
- The funny_dup17 trigger proc executes an INSERT into the same
- relation where it get fired for by a previous INSERT. And it
- stops this recursion only if it reaches a nesting level of
- 17, which could only occur if it is fired DURING the
- execution of it's own SPI_exec(). After Vadim quouted some
- SQL92 definitions about when constraint checks and triggers
- are to be executed, I decided to fire regular triggers at the
- end of a query too. Thus, there is absolutely no nesting
- possible for AFTER triggers resulting in an endless loop.
-
-
-Jan
-
---
-
-#======================================================================#
-# It's easier to get forgiveness for being wrong than for being right. #
-# Let's break this rule - forgive me. #
-
-
-
-************
-
-
- by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id LAA16162
- for ; Thu, 23 Sep 1999 11:01:04 -0400 (EDT)
-Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id KAA28544 for ; Thu, 23 Sep 1999 10:45:54 -0400 (EDT)
-Received: from hub.org (hub.org [216.126.84.1])
- by hub.org (8.9.3/8.9.3) with ESMTP id KAA52943;
- Thu, 23 Sep 1999 10:20:51 -0400 (EDT)
-Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 23 Sep 1999 10:19:58 +0000 (EDT)
-Received: (from majordom@localhost)
- by hub.org (8.9.3/8.9.3) id KAA52472
- for pgsql-hackers-outgoing; Thu, 23 Sep 1999 10:19:03 -0400 (EDT)
-Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
- by hub.org (8.9.3/8.9.3) with ESMTP id KAA52431
- for
; Thu, 23 Sep 1999 10:18:47 -0400 (EDT)
-Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
- by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id KAA13253;
- Thu, 23 Sep 1999 10:18:02 -0400 (EDT)
-Subject: Re: [HACKERS] Progress report: buffer refcount bugs and SQL functions
-In-reply-to: Your message of Thu, 23 Sep 1999 03:19:39 +0200 (MET DST)
-
-Date: Thu, 23 Sep 1999 10:18:01 -0400
-From: Tom Lane
-Precedence: bulk
-Status: RO
-
-> Tom Lane wrote:
->> What I am wondering, though, is whether this addition is actually
->> necessary, or is it a bug that the functions aren't run to completion
->> in the first place?
-
-> I've said some time (maybe too long) ago, that SQL functions
-> returning tuple sets are broken in general.
-
-Indeed they are. Try this on for size (using the regression database):
-
- SELECT p.name, p.hobbies.equipment.name FROM person p;
- SELECT p.hobbies.equipment.name, p.name FROM person p;
-
-You get different result sets!?
-
-The problem in this example is that ExecTargetList returns the isDone
-flag from the last targetlist entry, regardless of whether there are
-incomplete iterations in previous entries. More generally, the buffer
-leak problem that I started with only occurs if some Iter nodes are not
-run to completion --- but execQual.c has no mechanism to make sure that
-they have all reached completion simultaneously.
-
-What we really need to make functions-returning-sets work properly is
-an implementation somewhat like aggregate functions. We need to make
-a list of all the Iter nodes present in a targetlist and cycle through
-the values returned by each in a methodical fashion (run the rightmost
-through its full cycle, then advance the next-to-rightmost one value,
-run the rightmost through its cycle again, etc etc). Also there needs
-to be an understanding of the hierarchy when an Iter appears in the
-arguments of another Iter's function. (You cycle the upper one for
-*each* set of arguments created by cycling its sub-Iters.)
-
-I am not particularly interested in working on this feature right now,
-since AFAIK it's a Berkeleyism not found in SQL92. What I've done
-is to hack ExecTargetList so that it behaves semi-sanely when there's
-more than one Iter at the top level of the target list --- it still
-doesn't really give the right answer, but at least it will keep
-generating tuples until all the Iters are done at the same time.
-It happens that that's enough to give correct answers for the examples
-shown in the misc regress test. Even when it fails to generate all
-the possible combinations, there will be no buffer leaks.
-
-So, I'm going to declare victory and go home ;-). We ought to add a
-TODO item along the lines of
- * Functions returning sets don't really work right
-in hopes that someone will feel like tackling this someday.
-
- regards, tom lane
-
-************
-
-
+++ /dev/null
- by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA13457
- for ; Fri, 13 Nov 1998 13:24:35 -0500 (EST)
-Received: from localhost (majordom@localhost)
- by hub.org (8.9.1/8.9.1) with SMTP id NAA02464;
- Fri, 13 Nov 1998 13:22:52 -0500 (EST)
-Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 13 Nov 1998 13:21:14 +0000 (EST)
-Received: (from majordom@localhost)
- by hub.org (8.9.1/8.9.1) id NAA02331
- for pgsql-hackers-outgoing; Fri, 13 Nov 1998 13:21:12 -0500 (EST)
-Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8])
- by hub.org (8.9.1/8.9.1) with SMTP id NAA02316
- for
; Fri, 13 Nov 1998 13:21:06 -0500 (EST)
-Received: by orion.SAPserv.Hamburg.dsh.de
- id m0zeOEf-000EBPC; Fri, 13 Nov 98 19:46 MET
-Message-Id:
-Subject: [HACKERS] shmem limits and redolog
-Date: Fri, 13 Nov 1998 19:46:20 +0100 (MET)
-X-Mailer: ELM [version 2.4 PL25]
-Content-Type: text
-Precedence: bulk
-Status: ROr
-
-Hi,
-
- I'm currently hacking around on a solution for logging all
- database operations at query level that can recover a crashed
- database from the last successful backup by redoing all the
- commands.
-
- Well, I wanted it to be as flexible as can. So I decided to
- make it per database configurable. One could say which
- databases are logged and if a database is, if it is logged
- sync or async (in sync mode, every COMMIT forces an fsync of
- the actual logfile and controlfiles).
-
- To make async mode as fast as can, I'm using a shared memory
- of 32K per database (not per backend) that is used as a wrap
- around buffer from the backends to place their query
- information. So the log writer can fall a little behind if
- there are many backends doing different things that don't
- lock each other.
-
- Now I'm a little in doubt about the shared memory limits
- reported. Was it a good decision to use shared memory? Am I
- better off using socket's?
-
- The bad thing in what I have up to now (it's far from
- complete) is, that even if a database isn't currently logged,
- a redolog writer is started and creates the 32K shmem segment
- (plus a semaphore set with 5 semaphores). This is because I
- plan to create commands like
-
- ALTER DATABASE LOG MODE=ASYNC LOGDIR='/somewhere/dbname';
-
- and the like that can be used at runtime (while more than one
- backend is connected to the database) to turn logging on/off,
- switch to/from backup mode (all other activity is stopped)
- etc.
-
- So every 32 databases will require another megabyte of shared
- memory. The logging master controls which databases have
- activity and kills redolog writers after some time of
- inactivity, and the shmem is freed then. But it can hurt if
- someone really has many many databases that are all used at
- the same time.
-
- What do the others say?
-
-
-Jan
-
---
-
-#======================================================================#
-# It's easier to get forgiveness for being wrong than for being right. #
-# Let's break this rule - forgive me. #
-
-
-
-
- by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id PAA00521
- for ; Wed, 16 Dec 1998 15:46:40 -0500 (EST)
-Received: from hub.org (
[email protected] [209.47.145.100]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id PAA08772 for
; Wed, 16 Dec 1998 15:10:01 -0500 (EST)
-Received: from localhost (majordom@localhost)
- by hub.org (8.9.1/8.9.1) with SMTP id PAA01254;
- Wed, 16 Dec 1998 15:06:56 -0500 (EST)
-Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 16 Dec 1998 14:58:11 +0000 (EST)
-Received: (from majordom@localhost)
- by hub.org (8.9.1/8.9.1) id OAA00660
- for pgsql-hackers-outgoing; Wed, 16 Dec 1998 14:58:10 -0500 (EST)
-Received: from orion.SAPserv.Hamburg.dsh.de (Tpolaris2.sapham.debis.de [53.2.131.8])
- by hub.org (8.9.1/8.9.1) with SMTP id OAA00643
- for
; Wed, 16 Dec 1998 14:58:05 -0500 (EST)
-Received: by orion.SAPserv.Hamburg.dsh.de
- id m0zqNDo-000EBTC; Wed, 16 Dec 98 21:07 MET
-Message-Id:
-Subject: Re: [HACKERS] redolog - for discussion
-Date: Wed, 16 Dec 1998 21:07:00 +0100 (MET)
-X-Mailer: ELM [version 2.4 PL25]
-Content-Type: text
-Precedence: bulk
-Status: RO
-
-Vadim wrote:
-
->
-> Jan Wieck wrote:
-> >
-> > RECOVER DATABASE {ALL | UNTIL 'datetime' | RESET};
-> >
-> ...
-> >
-> > For the others, the backend starts the recovery program
-> > which reads the redolog files, establishes database
-> > connections as required and reruns all the commands in
-> ^^^^^^^^^^^^^^^^^^^^^^^^^^
-> > them. If a required logfile isn't found, it tells the
-> ^^^^^
->
-> I foresee problems with using _commands_ logging for
-> recovery/replication -:((
->
-> Let's consider two concurrent updates in READ COMMITTED mode:
->
-> update test set x = 2 where y = 1;
->
-> and
->
-> update test set x = 3 where y = 1;
->
-> The result of both committed transaction will be x = 2
-> if the 1st transaction updated row _after_ 2nd transaction
-> and x = 3 if the 2nd transaction gets row after 1st one.
-> Order of updates is not defined by order in which commands
-> begun and so order in which commands should be rerun
-> will be unknown...
-
- Yepp, the order in which commands begun is absolutely not of
- interest. Locking could already delay the execution of one
- command until another one started later has finished and
- released the lock. It's a classic race condition.
-
- Thus, my plan was to log the queries just before the call to
- CommitTransactionCommand() in tcop. This has the advantage,
- that queries which bail out with errors don't get into the
- log at all and must not get rerun. And I can set a static
- flag to false before starting the command, which is set to
- true in the buffer manager when a buffer is written (marked
- dirty), so filtering out queries that do no updates at all is
- easy.
-
- Unfortunately query level logging get's hit by the current
- implementation of sequence numbers. If a query that get's
- aborted somewhere in the middle (maybe by a trigger) called
- nextval() for rows processed earlier, the sequence number
- isn't advanced at recovery time, because the query is
- suppressed at all. And sequences aren't locked, so for
- concurrently running queries getting numbers from the same
- sequence, the results aren't reproduceable. If some
- application selects a value resulting from a sequence and
- uses that later in another query, how could the redolog know
- that this has changed? It's a Const in the query logged, and
- all that corrupts the whole thing.
-
- All that is painful and I don't see another solution yet than
- to hook into nextval(), log out the numbers generated in
- normal operation and getting back the same numbers in redo
- mode.
-
- The whole thing gets more and more complicated :-(
-
-
-Jan
-
---
-
-#======================================================================#
-# It's easier to get forgiveness for being wrong than for being right. #
-# Let's break this rule - forgive me. #
-
-
-
-
-Received: from hub.org (hub.org [209.167.229.1])
- by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id JAA22504
- for ; Wed, 16 Jun 1999 09:29:29 -0400 (EDT)
-Received: from hub.org (hub.org [209.167.229.1])
- by hub.org (8.9.3/8.9.3) with ESMTP id JAA02132;
- Wed, 16 Jun 1999 09:18:20 -0400 (EDT)
-Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 16 Jun 1999 09:14:07 +0000 (EDT)
-Received: (from majordom@localhost)
- by hub.org (8.9.3/8.9.3) id JAA01318
- for pgsql-hackers-outgoing; Wed, 16 Jun 1999 09:14:06 -0400 (EDT)
-X-Authentication-Warning: hub.org: majordom set sender to
[email protected] using -f
-Received: from sunpine.krs.ru (SunPine.krs.ru [195.161.16.37])
- by hub.org (8.9.3/8.9.3) with ESMTP id JAA01278
- for ; Wed, 16 Jun 1999 09:13:48 -0400 (EDT)
-Received: from krs.ru (dune.krs.ru [195.161.16.38])
- by sunpine.krs.ru (8.8.8/8.8.8) with ESMTP id VAA06276
- for ; Wed, 16 Jun 1999 21:12:49 +0800 (KRSS)
-Date: Wed, 16 Jun 1999 21:12:47 +0800
-From: Vadim Mikheev
-Organization: OJSC Rostelecom (Krasnoyarsk)
-X-Mailer: Mozilla 4.5 [en] (X11; I; FreeBSD 3.0-RELEASE i386)
-X-Accept-Language: ru, en
-MIME-Version: 1.0
-To: PostgreSQL Developers List
-Subject: [HACKERS] Savepoints...
-Content-Type: text/plain; charset=us-ascii
-Content-Transfer-Encoding: 7bit
-Precedence: bulk
-Status: ROr
-
-To have them I need to add tuple id (6 bytes) to heap tuple
-header. Are there objections? Though it's not good to increase
-tuple header size, subj is, imho, very nice feature...
-
-Implementation is , hm, "easy":
-
-- heap_insert/heap_delete/heap_replace/heap_mark4update will
- remember updated tid (and current command id) in relation cache
- and store previously updated tid (remembered in relation cache)
- in additional heap header tid;
-- lmgr will remember command id when lock was acquired;
-- for a savepoint we will just store command id when
- the savepoint was setted;
-- when going to sleep due to concurrent the-same-row update,
- backend will store MyProc and tuple id in shmem hash table.
-
-When rolling back to a savepoint, backend will:
-
-- release locks acquired after savepoint;
-- for a relation updated after savepoint, get last updated tid
- from relation cache, walk through relation, set
- HEAP_XMIN_INVALID/HEAP_XMAX_INVALID in all tuples updated
- after savepoint and wake up concurrent writers blocked
- on these tuples (using shmem hash table mentioned above).
-
-The last feature (waking up of concurrent writers) is most hard
-part to implement. AFAIK, Oracle 7.3 was not able to do it.
-Can someone comment is this feature implemented in Oracle 8.X,
-other DBMSes?
-
-Now about implicit savepoints. Backend will place them before
-user statements execution. In the case of failure, transaction
-state will be rolled back to the one before execution of query.
-As side-effect, this means that we'll get rid of complaints
-about entire transaction abort in the case of mistyping
-causing abort due to parser errors...
-
-Comments?
-
-Vadim
-
-
+++ /dev/null
- by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA07771
- for ; Thu, 7 Jan 1999 13:31:06 -0500 (EST)
-Received: from golem.jpl.nasa.gov (IDENT:
[email protected] [128.149.68.204]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id NAA14597 for
; Thu, 7 Jan 1999 13:27:37 -0500 (EST)
-Received: from alumni.caltech.edu (localhost [127.0.0.1])
- by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id SAA13416;
- Thu, 7 Jan 1999 18:26:56 GMT
-Date: Thu, 07 Jan 1999 18:26:56 +0000
-From: "Thomas G. Lockhart"
-Organization: Caltech/JPL
-X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.30 i686)
-MIME-Version: 1.0
-To: Bruce Momjian
-CC: Postgres Hackers List
-Subject: Outer Joins (and need CASE help)
-Content-Type: text/plain; charset=us-ascii
-Content-Transfer-Encoding: 7bit
-Status: RO
-
-> Thomas, do you need help on outer joins?
-
-Yes. I'm going slowly partly because I get distracted with other
-Postgres stuff like docs, and partly because I don't understand all of
-the pieces I'm working with.
-
-I've identified the place in the MergeJoin code where the null filling
-for outer joins needs to happen, and have the "merge walk" code done.
-But I don't have the supporting code which actually would know how to
-null-fill a result tuple from the left or right. I thought you might be
-interested in that?
-
-I've done some work in the parser, and can now do things like:
-
-postgres=> select * from t1 join t2 using (i);
-NOTICE: JOIN not yet implemented
-i|j|i|k
--+-+-+-
-1|2|1|3
-(1 row)
-
-But this is just an inner join, and the result isn't quite right since
-the second "i" column should probably be omitted. At the moment I
-transform it from the syntax above into existing parse nodes, and
-everything from there on works.
-
-I don't yet pass an explicit join node into the planner/optimizer, and
-that will be the hardest part I assume. Perhaps we can work on that
-together.
-
-So, what I'll try to do (soon, in the next few days?) is put in
-
- #ifdef ENABLE_OUTER_JOINS
-
-conditional code into the parser area (already there for the executor)
-and commit everything to the development tree. Does that sound OK?
-
-Oh, and if anyone is looking for something to do, I've got a couple of
-CASE statements in the case.sql regression test which are commented out
-because they crash the backend. They involve references to multiple
-tables within a single result column, and in other contexts that
-construct works. It would be great if someone had time to track it
-down...
-
- - Tom
-
- by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA22073
- for ; Mon, 22 Feb 1999 02:01:12 -0500 (EST)
-Received: from golem.jpl.nasa.gov (IDENT:
[email protected] [128.149.68.204]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id BAA26054 for
; Mon, 22 Feb 1999 01:57:00 -0500 (EST)
-Received: from alumni.caltech.edu (localhost [127.0.0.1])
- by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA04715;
- Mon, 22 Feb 1999 06:56:36 GMT
-Date: Mon, 22 Feb 1999 06:56:36 +0000
-From: "Thomas G. Lockhart"
-Organization: Caltech/JPL
-X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.36 i686)
-MIME-Version: 1.0
-To: Bruce Momjian
-Subject: Re: start on outer join
-Content-Type: text/plain; charset=us-ascii
-Content-Transfer-Encoding: 7bit
-Status: ROr
-
-Bruce Momjian wrote:
->
-> > Will apply ... some other changes laying a bit of
-> > groundwork for outer joins so you can start on the planner/optimizer
-> > parts :)
-> Those will be a synch now that I understand the optimizer. In fact, I
-> think it all will happen in the executor.
-
-I've modified executor/nodeMergeJoin.c to walk a left/right/both outer
-join, but didn't fill in the part which actually creates the result
-tuple (which will be the current left- or right-side tuple plus nulls
-for filler). I hope this is up your alley :)
-
-So far, I'm not certain what to pass to the planner. The syntax leads me
-to pass a select structure from gram.y with a "JoinExpr" structure in
-the "fromClause" list. I need to expand that with a combination of
-column names and qualifications, but at the time I see the JoinExpr I
-don't have access to the top query structure itself. So I may just keep
-a modestly transformed JoinExpr to expand later or to pass to the
-planner.
-
-btw, the EXCEPT/INTERSECT stuff from Stefan has some ugliness in gram.y
-which needs to be fixed (the shift/reduce conflict is not acceptable for
-our release version) and some of that code clearly needs to move to
-analyze.c or some other module.
-
- - Tom
-
-From maillist Wed Feb 24 05:27:08 1999
-Received: (from maillist@localhost)
- by candle.pha.pa.us (8.9.0/8.9.0) id FAA09648;
- Wed, 24 Feb 1999 05:27:08 -0500 (EST)
-From: Bruce Momjian
-Subject: Re: [HACKERS] OUTER joins
-Date: Wed, 24 Feb 1999 05:27:07 -0500 (EST)
-X-Mailer: ELM [version 2.4ME+ PL47 (25)]
-MIME-Version: 1.0
-Content-Type: text/plain; charset=US-ASCII
-Content-Transfer-Encoding: 7bit
-Status: RO
-
->
-> How do you propose doing outer joins in non-mergejoin situations?
-> Mergejoins can only be used currently in equal joins.
-
-Is your solution going to be to make sure the OUTER table is always a
-MergeJoin, or on the outside of a join loop? That could work.
-
-That could get tricky if the table is joined to _two_ other tables.
-With the cleaned-up optimizer, we can disable non-merge joins in certain
-circumstances, and prevent OUTER tables from being inner in the others.
-Is that the plan?
-
---
- Bruce Momjian | http://www.op.net/~candle
- + If your life is a hard drive, | 830 Blythe Avenue
- + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
-
- by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA21672
- for ; Mon, 1 Mar 1999 13:01:06 -0500 (EST)
-Received: from golem.jpl.nasa.gov (IDENT:
[email protected] [128.149.68.204]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id MAA12756 for
; Mon, 1 Mar 1999 12:14:16 -0500 (EST)
-Received: from alumni.caltech.edu (localhost [127.0.0.1])
- by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id RAA09406;
- Mon, 1 Mar 1999 17:10:49 GMT
-Date: Mon, 01 Mar 1999 17:10:49 +0000
-From: "Thomas G. Lockhart"
-Organization: Caltech/JPL
-X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.36 i686)
-MIME-Version: 1.0
-To: Bruce Momjian
-CC: PostgreSQL-development
-Subject: Re: OUTER joins
-Content-Type: text/plain; charset=us-ascii
-Content-Transfer-Encoding: 7bit
-Status: ROr
-
-(back from a short vacation...)
-
-> How do you propose doing outer joins in non-mergejoin situations?
-> Mergejoins can only be used currently in equal joins.
-
-Hadn't thought about it, other than figuring that implementing the
-equi-join first was a good start. There is a class of outer join syntax
-(the USING clause) which is implicitly an equi-join...
-
- - Tom
-
- by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA15978
- for ; Mon, 8 Mar 1999 21:54:57 -0500 (EST)
-Received: from golem.jpl.nasa.gov (IDENT:
[email protected] [128.149.68.203]) by renoir.op.net (o1/$ Revision: 1.18 $) with ESMTP id VAA15837 for
; Mon, 8 Mar 1999 21:48:33 -0500 (EST)
-Received: from alumni.caltech.edu (localhost [127.0.0.1])
- by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id CAA06996;
- Tue, 9 Mar 1999 02:46:40 GMT
-Date: Tue, 09 Mar 1999 02:46:40 +0000
-From: "Thomas G. Lockhart"
-Organization: Caltech/JPL
-X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.36 i686)
-MIME-Version: 1.0
-To: Bruce Momjian
-Subject: Re: OUTER joins
-Content-Type: text/plain; charset=us-ascii
-Content-Transfer-Encoding: 7bit
-Status: ROr
-
-> > Hadn't thought about it, other than figuring that implementing the
-> > equi-join first was a good start. There is a class of outer join
-> > syntax (the USING clause) which is implicitly an equi-join...
-> Not that easy. You don't automatically get a mergejoin from an
-> equijoin. I will have to force outer's to be either mergejoins, or
-> inners of non-merge joins. Can you add code to non-merge joins in the
-> executor to throw out a null row if it does not find an inner match
-> for the outer row, and I will handle the optimizer so it doesn't throw
-> a non-conforming plan to the executor.
-
-So far I don't have enough info in the parser to get the
-planner/optimizer going. Should we work from the front to the back, or
-should I go ahead and look at the non-merge joins? It's painfully
-obvious that I don't know anything about the middle parts of this to
-proceed without lots more research.
-
- - Tom
-
- by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07869
- for ; Tue, 9 Mar 1999 22:47:54 -0500 (EST)
-Received: from alumni.caltech.edu (localhost [127.0.0.1])
- by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id DAA14761;
- Wed, 10 Mar 1999 03:46:43 GMT
-Date: Wed, 10 Mar 1999 03:46:43 +0000
-From: "Thomas G. Lockhart"
-Organization: Caltech/JPL
-X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.36 i686)
-MIME-Version: 1.0
-Subject: Re: SQL outer
-Content-Type: text/plain; charset=us-ascii
-Content-Transfer-Encoding: 7bit
-Status: RO
-
-> select *
-> from outer tab1, tab2, tab3
-> where tab1.col1 = tab2.col1 and
-> tab1.col1 = tab3.col1
-
-select *
-from t1 left join t2 using (c1)
- join t3 on (c1 = t3.c1)
-
-Result:
-t1.c1 t1.c2 t2.c2 t3.c1
-2 12 NULL 32
-
-t1:
-c1 c2
-1 11
-2 12
-3 13
-4 14
-
-t2:
-c1 c2
-1 21
-3 23
-
-t3:
-c1 c2
-2 32
-
- by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA16741
- for ; Wed, 10 Mar 1999 10:48:51 -0500 (EST)
-Received: from alumni.caltech.edu (localhost [127.0.0.1])
- by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id PAA17723;
- Wed, 10 Mar 1999 15:48:31 GMT
-Date: Wed, 10 Mar 1999 15:48:31 +0000
-From: "Thomas G. Lockhart"
-Organization: Caltech/JPL
-X-Mailer: Mozilla 4.07 [en] (X11; I; Linux 2.0.36 i686)
-MIME-Version: 1.0
-To: Bruce Momjian
-CC: Thomas Lockhart
-Subject: Re: SQL outer
-Content-Type: text/plain; charset=us-ascii
-Content-Transfer-Encoding: 7bit
-Status: ROr
-
-Just thinking...
-
-If the initial RelOptInfo groupings are derived from the WHERE clause
-expressions, how about marking the "outer" property in those expressions
-in the parser? istm that is where the parser knows about two tables in
-one place, and I'm generating those expressions anyway. We could add a
-field(s) to the expression structure, or pass along a slightly different
-structure...
-
- - Tom
-
-Received: from hub.org (hub.org [216.126.84.1])
- by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id CAA13837
- for ; Wed, 21 Jul 1999 02:35:12 -0400 (EDT)
-Received: from hub.org (hub.org [216.126.84.1])
- by hub.org (8.9.3/8.9.3) with ESMTP id CAA88539;
- Wed, 21 Jul 1999 02:27:41 -0400 (EDT)
-Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 21 Jul 1999 02:24:08 +0000 (EDT)
-Received: (from majordom@localhost)
- by hub.org (8.9.3/8.9.3) id CAA87850
- for pgsql-hackers-outgoing; Wed, 21 Jul 1999 02:23:13 -0400 (EDT)
- by hub.org (8.9.3/8.9.3) with ESMTP id CAA87810
- for
; Wed, 21 Jul 1999 02:22:52 -0400 (EDT)
-Received: from alumni.caltech.edu (lockhart@localhost [127.0.0.1])
- by localhost (8.8.7/8.8.7) with ESMTP id GAA14480;
- Wed, 21 Jul 1999 06:20:22 GMT
-Date: Wed, 21 Jul 1999 06:20:22 +0000
-From: Thomas Lockhart
-X-Mailer: Mozilla 4.6 [en] (X11; I; Linux 2.0.36 i686)
-X-Accept-Language: en
-MIME-Version: 1.0
-To: Tom Lane
-Subject: Re: [HACKERS] Another reason to redesign querytree representation
-Content-Type: text/plain; charset=us-ascii
-Content-Transfer-Encoding: 7bit
-Precedence: bulk
-Status: RO
-
-> Thomas, what do you think is needed for outer joins?
-
-Bruce and I have talked about it some already:
-
-For outer joins, tables must be combined in a particular order. For
-example, a left outer join requires that any entries in the left-side
-table which do not have a corresponding entry in the right-side table
-be expanded with nulls during the join. The information on the outer
-join can't be carried by the rte since the same table can appear twice
-in an outer join expression:
-
- select * from t1 left join t2 using (i)
- left join t1 on (i = t1.j);
-
-For a query like
-
- select * from t1 left join t2 using (i) where t2.j = 3;
-
-istm that the outer join must be done before the t2 qualification is
-applied, and that another ordering may produce the wrong result.
-
->From what I understand Bruce to say, the planner/optimizer is allowed
-to try all kinds of permutations of plans, choosing the one with the
-lowest cost. But if the info for the join is carried in a
-qualification node, then the planner/optimizer must know that it can't
-reorder the query as freely as it does now.
-
-I was thinking of having a new qualification node to carry this info,
-and it could be transformed into a mergejoin node which has a couple
-of new fields indicating left and/or right outer join behavior.
-
-A hashjoin method may be possible for queries which are structured as
-a left outer join; other outer joins will need to use the mergejoin
-method. Also, some poorly-qualified outer joins reduce to inner joins,
-and perhaps the optimizer can be smart enough to realize this.
-
- - Thomas
-
---
-South Pasadena, California
-
-