---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
+Return-path:
+ by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id jBJBboe20936
+ for
; Mon, 19 Dec 2005 06:37:51 -0500 (EST)
+Received: from kleptog by svana.org with local (Exim 3.35 #1 (Debian))
+ id 1EoJKc-00045V-00; Mon, 19 Dec 2005 22:37:30 +1100
+Date: Mon, 19 Dec 2005 12:37:30 +0100
+From: Martijn van Oosterhout
+To: Dann Corbit
+cc: Tom Lane , Qingqing Zhou ,
+ Luke Lonergan , Neil Conway ,
+Subject: Re: [HACKERS] Re: Which qsort is used
+Reply-To: Martijn van Oosterhout
+References:
+MIME-Version: 1.0
+Content-Type: multipart/signed; micalg=pgp-sha1;
+ protocol="application/pgp-signature"; boundary="5gxpn/Q6ypwruk0T"
+Content-Disposition: inline
+In-Reply-To:
+User-Agent: Mutt/1.3.28i
+X-PGP-Key-ID: Length=1024; ID=0x0DC67BE6
+X-PGP-Key-Fingerprint: 295F A899 A81A 156D B522 48A7 6394 F08A 0DC6 7BE6
+X-PGP-Key-URL:
+Status: OR
+
+
+--5gxpn/Q6ypwruk0T
+Content-Type: text/plain; charset=us-ascii
+Content-Disposition: inline
+Content-Transfer-Encoding: quoted-printable
+
+On Fri, Dec 16, 2005 at 10:43:58PM -0800, Dann Corbit wrote:
+> I am actually quite impressed with the excellence of Bentley's sort out
+> of the box. It's definitely the best library implementation of a sort I
+> have seen.
+
+I'm not sure whether we have a conclusion here, but I do have one
+question: is there a significant difference in the number of times the
+comparison routines are called? Comparisons in PostgreSQL are fairly
+expensive given the fmgr overhead and when comparing tuples it's even
+worse.
+
+We don't want to accedently pick a routine that saves data shuffling by
+adding extra comparisons. The stats at [1] don't say. They try to
+factor in CPU cost but they seem to use unrealistically small values. I
+would think a number around 50 (or higher) would be more
+representative.
+
+[1] http://www.cs.toronto.edu/~zhouqq/postgresql/sort/sort.html
+
+Have a nice day,
+--=20
+Martijn van Oosterhout http://svana.org/kleptog/
+> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
+> tool for doing 5% of the work and then sitting around waiting for someone
+> else to do the other 95% so you can sue them.
+
+--5gxpn/Q6ypwruk0T
+Content-Type: application/pgp-signature
+Content-Disposition: inline
+
+-----BEGIN PGP SIGNATURE-----
+Version: GnuPG v1.0.6 (GNU/Linux)
+Comment: For info see http://www.gnupg.org
+
+iD8DBQFDpptzIB7bNG8LQkwRAmC6AJ4qYrIm3SYnBV3BybSmm+Gl4vpEywCfRDxg
+bnIK4INRqOVFNBAKR/gDPcM=
+=92qA
+-----END PGP SIGNATURE-----
+
+--5gxpn/Q6ypwruk0T--
+
+Return-path:
+Received: from email.aon.at (warsl404pip5.highway.telekom.at [195.3.96.77])
+ by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id jBM0i2e05649
+ for
; Wed, 21 Dec 2005 19:44:02 -0500 (EST)
+Received: (qmail 12703 invoked from network); 22 Dec 2005 00:43:51 -0000
+Received: from m148p015.dipool.highway.telekom.at (HELO Sokrates) ([62.46.8.111])
+ (envelope-sender )
+ by smarthub78.highway.telekom.at (qmail-ldap-1.03) with SMTP
+ for ; 22 Dec 2005 00:43:51 -0000
+From: Manfred Koizar
+To: Tom Lane
+cc: "Dann Corbit" , "Qingqing Zhou" ,
+ "Luke Lonergan" ,
+Subject: Re: [HACKERS] Re: Which qsort is used
+Date: Thu, 22 Dec 2005 01:43:34 +0100
+Message-ID:
+X-Mailer: Forte Agent 3.1/32.783
+MIME-Version: 1.0
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+Status: OR
+
+On Sat, 17 Dec 2005 00:03:25 -0500, Tom Lane
+wrote:
+>I've still got a problem with these checks; I think they are a net
+>waste of cycles on average. [...]
+> and when they fail, those cycles are entirely wasted;
+>you have not advanced the state of the sort at all.
+
+How can we make the initial check "adavance the state of the sort"?
+One answer might be to exclude the sorted sequence at the start of the
+array from the qsort, and merge the two sorted lists as the final
+stage of the sort.
+
+Qsorting N elements costs O(N*lnN), so excluding H elements from the
+sort reduces the cost by at least O(H*lnN). The merge step costs O(N)
+plus some (<=50%) more memory, unless someone knows a fast in-place
+merge. So depending on the constant factors involved there might be a
+usable solution.
+
+I've been playing with some numbers and assuming the constant factors
+to be equal for all the O()'s this method starts to pay off at
+ H for N
+ 20 100
+ 130 1000
+ 8000 100000
+Servus
+ Manfred
+
+Received: from ams.hub.org (ams.hub.org [200.46.204.13])
+ by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id jBM72Re16910
+ for
; Thu, 22 Dec 2005 02:02:28 -0500 (EST)
+Received: from postgresql.org (postgresql.org [200.46.204.71])
+ by ams.hub.org (Postfix) with ESMTP id A31E067AAA0
+ for
; Thu, 22 Dec 2005 03:02:22 -0400 (AST)
+Received: from localhost (av.hub.org [200.46.204.144])
+ by postgresql.org (Postfix) with ESMTP id 2C8EC9DCA92
+ for
; Thu, 22 Dec 2005 03:01:56 -0400 (AST)
+Received: from postgresql.org ([200.46.204.71])
+ by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
+ with ESMTP id 26033-04
+ Thu, 22 Dec 2005 03:01:55 -0400 (AST)
+X-Greylist: from auto-whitelisted by SQLgrey-
+Received: from svana.org (svana.org [203.20.62.76])
+ by postgresql.org (Postfix) with ESMTP id 800859DC81D
+ for
; Thu, 22 Dec 2005 03:01:51 -0400 (AST)
+Received: from kleptog by svana.org with local (Exim 3.35 #1 (Debian))
+ id 1EpKRg-0005ox-00; Thu, 22 Dec 2005 18:01:00 +1100
+Date: Thu, 22 Dec 2005 08:01:00 +0100
+From: Martijn van Oosterhout
+To: Manfred Koizar
+cc: Tom Lane , Dann Corbit ,
+ Qingqing Zhou ,
+ Luke Lonergan , Neil Conway ,
+Subject: Re: [HACKERS] Re: Which qsort is used
+Reply-To: Martijn van Oosterhout
+MIME-Version: 1.0
+Content-Type: multipart/signed; micalg=pgp-sha1;
+ protocol="application/pgp-signature"; boundary="FL5UXtIhxfXey3p5"
+Content-Disposition: inline
+In-Reply-To:
+User-Agent: Mutt/1.3.28i
+X-PGP-Key-ID: Length=1024; ID=0x0DC67BE6
+X-PGP-Key-Fingerprint: 295F A899 A81A 156D B522 48A7 6394 F08A 0DC6 7BE6
+X-PGP-Key-URL:
+X-Virus-Scanned: by amavisd-new at hub.org
+X-Spam-Status: No, score=0.065 required=5 tests=[AWL=0.065]
+X-Spam-Score: 0.065
+X-Mailing-List: pgsql-hackers
+List-Archive:
+List-Help:
+List-Owner:
+List-Post:
+List-Subscribe:
+List-Unsubscribe:
+Precedence: bulk
+Status: OR
+
+
+--FL5UXtIhxfXey3p5
+Content-Type: text/plain; charset=us-ascii
+Content-Disposition: inline
+Content-Transfer-Encoding: quoted-printable
+
+On Thu, Dec 22, 2005 at 01:43:34AM +0100, Manfred Koizar wrote:
+> Qsorting N elements costs O(N*lnN), so excluding H elements from the
+> sort reduces the cost by at least O(H*lnN). The merge step costs O(N)
+> plus some (<=3D50%) more memory, unless someone knows a fast in-place
+> merge. So depending on the constant factors involved there might be a
+> usable solution.
+
+But where are you including the cost to check how many cells are
+already sorted? That would be O(H), right? This is where we come back
+to the issue that comparisons in PostgreSQL are expensive. The cpu_cost
+in the tests I saw so far is unrealistically low.
+
+> I've been playing with some numbers and assuming the constant factors
+> to be equal for all the O()'s this method starts to pay off at
+> H for N
+> 20 100 20%
+> 130 1000 13%
+> 8000 100000 8%
+
+Hmm, what are the chances you have 100000 unordered items to sort and
+that the first 8% will already be in order. ISTM that that probability
+will be close enough to zero to not matter...
+
+Have a nice day,
+--=20
+Martijn van Oosterhout http://svana.org/kleptog/
+> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
+> tool for doing 5% of the work and then sitting around waiting for someone
+> else to do the other 95% so you can sue them.
+
+--FL5UXtIhxfXey3p5
+Content-Type: application/pgp-signature
+Content-Disposition: inline
+
+-----BEGIN PGP SIGNATURE-----
+Version: GnuPG v1.0.6 (GNU/Linux)
+Comment: For info see http://www.gnupg.org
+
+iD8DBQFDqk8oIB7bNG8LQkwRAjJhAJ47eXRi1DJ02cfKcnN2iPkaBB0eaQCeIiF+
+HOAYIPQrU2gpUUiGT3aGUUw=
+=R0hU
+-----END PGP SIGNATURE-----
+
+--FL5UXtIhxfXey3p5--
+
+Received: from ams.hub.org (ams.hub.org [200.46.204.13])
+ by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id jBMLxJe07480
+ for
; Thu, 22 Dec 2005 16:59:19 -0500 (EST)
+Received: from postgresql.org (postgresql.org [200.46.204.71])
+ by ams.hub.org (Postfix) with ESMTP id D1DBE67AC1B
+ for
; Thu, 22 Dec 2005 17:59:16 -0400 (AST)
+Received: from localhost (av.hub.org [200.46.204.144])
+ by postgresql.org (Postfix) with ESMTP id BE8249DCBEB
+ for
; Thu, 22 Dec 2005 17:58:53 -0400 (AST)
+Received: from postgresql.org ([200.46.204.71])
+ by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
+ with ESMTP id 64765-01
+ Thu, 22 Dec 2005 17:58:54 -0400 (AST)
+X-Greylist: from auto-whitelisted by SQLgrey-
+Received: from email.aon.at (warsl404pip7.highway.telekom.at [195.3.96.91])
+ by postgresql.org (Postfix) with ESMTP id 3E08E9DCA5C
+ for
; Thu, 22 Dec 2005 17:58:49 -0400 (AST)
+Received: (qmail 6986 invoked from network); 22 Dec 2005 21:58:49 -0000
+Received: from m150p015.dipool.highway.telekom.at (HELO Sokrates) ([62.46.8.175])
+ (envelope-sender )
+ by smarthub76.highway.telekom.at (qmail-ldap-1.03) with SMTP
+ for ; 22 Dec 2005 21:58:49 -0000
+From: Manfred Koizar
+To: Martijn van Oosterhout
+cc: Tom Lane , Dann Corbit ,
+ Qingqing Zhou ,
+ Luke Lonergan , Neil Conway ,
+Subject: Re: [HACKERS] Re: Which qsort is used
+Date: Thu, 22 Dec 2005 22:58:31 +0100
+X-Mailer: Forte Agent 3.1/32.783
+MIME-Version: 1.0
+Content-Type: text/plain; charset=us-ascii
+Content-Transfer-Encoding: 7bit
+X-Virus-Scanned: by amavisd-new at hub.org
+X-Spam-Status: No, score=0.398 required=5 tests=[AWL=0.398]
+X-Spam-Score: 0.398
+X-Mailing-List: pgsql-hackers
+List-Archive:
+List-Help:
+List-Owner:
+List-Post:
+List-Subscribe:
+List-Unsubscribe:
+Precedence: bulk
+Status: OR
+
+On Thu, 22 Dec 2005 08:01:00 +0100, Martijn van Oosterhout
+ wrote:
+>But where are you including the cost to check how many cells are
+>already sorted? That would be O(H), right?
+
+Yes. I didn't mention it, because H < N.
+
+> This is where we come back
+>to the issue that comparisons in PostgreSQL are expensive.
+
+So we agree that we should try to reduce the number of comparisons.
+How many comparisons does it take to sort 100000 items? 1.5 million?
+
+>Hmm, what are the chances you have 100000 unordered items to sort and
+>that the first 8% will already be in order. ISTM that that probability
+>will be close enough to zero to not matter...
+
+If the items are totally unordered, the check is so cheap you won't
+even notice. OTOH in Tom's example ...
+
+|What I think is much more probable in the Postgres environment
+|is almost-but-not-quite-ordered inputs --- eg, a table that was
+|perfectly ordered by key when filled, but some of the tuples have since
+|been moved by UPDATEs.
+
+... I'd not be surprised if H is 90% of N.
+Servus
+ Manfred
+
+---------------------------(end of broadcast)---------------------------
+TIP 2: Don't 'kill -9' the postmaster
+
+Return-path:
+Received: from postal.corporate.connx.com (postal.corporate.connx.com [65.212.159.187])
+ by candle.pha.pa.us (8.11.6/8.11.6) with SMTP id jBMMLve11671
+ for
; Thu, 22 Dec 2005 17:22:03 -0500 (EST)
+Content-class: urn:content-classes:message
+MIME-Version: 1.0
+Content-Type: text/plain;
+ charset="us-ascii"
+Subject: RE: [HACKERS] Re: Which qsort is used
+X-MimeOLE: Produced By Microsoft Exchange V6.5
+Date: Thu, 22 Dec 2005 14:21:49 -0800
+Message-ID:
+Thread-Topic: [HACKERS] Re: Which qsort is used
+Thread-Index: AcYHQuXJdKs8JVgmSKywUqld6KYccQAAfWAA
+From: "Dann Corbit"
+To: "Manfred Koizar" ,
+ "Martijn van Oosterhout"
+cc: "Tom Lane" , "Qingqing Zhou" ,
+ "Luke Lonergan" ,
+Content-Transfer-Encoding: 8bit
+X-MIME-Autoconverted: from quoted-printable to 8bit by candle.pha.pa.us id jBMMLve11671
+Status: OR
+
+An interesting article on sorting and comparison count:
+http://www.acm.org/jea/ARTICLES/Vol7Nbr5.pdf
+
+Here is the article, the code, and an implementation that I have been
+toying with:
+http://cap.connx.com/chess-engines/new-approach/algos.zip
+
+Algorithm quickheap is especially interesting because it does not
+require much additional space (just an array of integers up to size
+log(element_count) and in addition, it has very few data movements.
+
+> -----Original Message-----
+> Sent: Thursday, December 22, 2005 1:59 PM
+> To: Martijn van Oosterhout
+> Cc: Tom Lane; Dann Corbit; Qingqing Zhou; Bruce Momjian; Luke
+Lonergan;
+> Subject: Re: [HACKERS] Re: Which qsort is used
+>
+> On Thu, 22 Dec 2005 08:01:00 +0100, Martijn van Oosterhout
+> wrote:
+> >But where are you including the cost to check how many cells are
+> >already sorted? That would be O(H), right?
+>
+> Yes. I didn't mention it, because H < N.
+>
+> > This is where we come back
+> >to the issue that comparisons in PostgreSQL are expensive.
+>
+> So we agree that we should try to reduce the number of comparisons.
+> How many comparisons does it take to sort 100000 items? 1.5 million?
+>
+> >Hmm, what are the chances you have 100000 unordered items to sort and
+> >that the first 8% will already be in order. ISTM that that
+probability
+> >will be close enough to zero to not matter...
+>
+> If the items are totally unordered, the check is so cheap you won't
+> even notice. OTOH in Tom's example ...
+>
+> |What I think is much more probable in the Postgres environment
+> |is almost-but-not-quite-ordered inputs --- eg, a table that was
+> |perfectly ordered by key when filled, but some of the tuples have
+since
+> |been moved by UPDATEs.
+>
+> ... I'd not be surprised if H is 90% of N.
+> Servus
+> Manfred
+