Provide a bit more high-level documentation for the GEQO planner.

author Tom Lane

Sat, 21 Jul 2007 04:02:41 +0000 (04:02 +0000)

committer Tom Lane

Sat, 21 Jul 2007 04:02:41 +0000 (04:02 +0000)
author Tom Lane
Sat, 21 Jul 2007 04:02:41 +0000 (04:02 +0000)
committer Tom Lane
Sat, 21 Jul 2007 04:02:41 +0000 (04:02 +0000)
diff --git a/doc/src/sgml/arch-dev.sgml b/doc/src/sgml/arch-dev.sgml

index c861a656e904fcbf5dcbf53a4929613e85598a46..7ee1ba357f09ffc47930393de5b2fe76c27dc4c2 100644 (file)
--- a/doc/src/sgml/arch-dev.sgml
+++ b/doc/src/sgml/arch-dev.sgml
@@ -1,4 +1,4 @@
-
+
  
   
    Overview of PostgreSQL Internals
@@ -345,9 +345,10 @@
       can be executed would take an excessive amount of time and memory
       space. In particular, this occurs when executing queries
       involving large numbers of join operations. In order to determine
-     a reasonable (not optimal) query plan in a reasonable amount of
-     time, PostgreSQL uses a 
-     linkend="geqo" endterm="geqo-title">.
+     a reasonable (not necessarily optimal) query plan in a reasonable amount
+     of time, PostgreSQL uses a 
+     linkend="geqo" endterm="geqo-title"> when the number of joins
+     exceeds a threshold (see ).
     
    
 
@@ -380,20 +381,17 @@
      the index's operator class, another plan is created using
      the B-tree index to scan the relation. If there are further indexes
      present and the restrictions in the query happen to match a key of an
-     index further plans will be considered.
+     index, further plans will be considered.  Index scan plans are also
+     generated for indexes that have a sort ordering that can match the
+     query's ORDER BY clause (if any), or a sort ordering that
+     might be useful for merge joining (see below).
     
 
     
-     After all feasible plans have been found for scanning single relations,
-     plans for joining relations are created. The planner/optimizer
-     preferentially considers joins between any two relations for which there
-     exist a corresponding join clause in the WHERE qualification (i.e. for
-     which a restriction like where rel1.attr1=rel2.attr2
-     exists). Join pairs with no join clause are considered only when there
-     is no other choice, that is, a particular relation has no available
-     join clauses to any other relation. All possible plans are generated for
-     every join pair considered
-     by the planner/optimizer. The three possible join strategies are:
+     If the query requires joining two or more relations,
+     plans for joining relations are considered
+     after all feasible plans have been found for scanning single relations.
+     The three available join strategies are:
 
      
       
@@ -439,6 +437,26 @@
      cheapest one.
     
 
+    
+     If the query uses fewer than 
+     relations, a near-exhaustive search is conducted to find the best
+     join sequence.  The planner preferentially considers joins between any
+     two relations for which there exist a corresponding join clause in the
+     WHERE qualification (i.e. for
+     which a restriction like where rel1.attr1=rel2.attr2
+     exists). Join pairs with no join clause are considered only when there
+     is no other choice, that is, a particular relation has no available
+     join clauses to any other relation. All possible plans are generated for
+     every join pair considered by the planner, and the one that is
+     (estimated to be) the cheapest is chosen.
+    
+
+    
+     When geqo_threshold is exceeded, the join
+     sequences considered are determined by heuristics, as described
+     in .  Otherwise the process is the same.
+    
+
     
      The finished plan tree consists of sequential or index scans of
      the base relations, plus nested-loop, merge, or hash join nodes as


diff --git a/doc/src/sgml/geqo.sgml b/doc/src/sgml/geqo.sgml

index 6225dc4c3219ac87cbc8443bff86b44e11f42336..2f680762c13bb45c3b85bbba5c43011de112eb4b 100644 (file)


--- a/doc/src/sgml/geqo.sgml
+++ b/doc/src/sgml/geqo.sgml
@@ -1,4 +1,4 @@
-
+
 
  
   
@@ -186,11 +186,6 @@
     PostgreSQL optimizer.
    
 
-   
-    Parts of the GEQO module are adapted from D. Whitley's Genitor
-    algorithm.
-   
-
    
     Specific characteristics of the GEQO
     implementation in PostgreSQL
@@ -224,6 +219,11 @@
     
    
 
+   
+    Parts of the GEQO module are adapted from D. Whitley's
+    Genitor algorithm.
+   
+
    
     The GEQO module allows
     the PostgreSQL query optimizer to
@@ -231,6 +231,42 @@
     non-exhaustive search.
    
 
+  
+   Generating Possible Plans with <acronym>GEQO</acronym>
+
+   
+    The GEQO planning process uses the standard planner
+    code to generate plans for scans of individual relations.  Then join
+    plans are developed using the genetic approach.  As shown above, each
+    candidate join plan is represented by a sequence in which to join
+    the base relations.  In the initial stage, the GEQO
+    code simply generates some possible join sequences at random.  For each
+    join sequence considered, the standard planner code is invoked to
+    estimate the cost of performing the query using that join sequence.
+    (For each step of the join sequence, all three possible join strategies
+    are considered; and all the initially-determined relation scan plans
+    are available.  The estimated cost is the cheapest of these
+    possibilities.)  Join sequences with lower estimated cost are considered
+    more fit than those with higher cost.  The genetic algorithm
+    discards the least fit candidates.  Then new candidates are generated
+    by combining genes of more-fit candidates — that is, by using
+    randomly-chosen portions of known low-cost join sequences to create
+    new sequences for consideration.  This process is repeated until a
+    preset number of join sequences have been considered; then the best
+    one found at any time during the search is used to generate the finished
+    plan.
+   
+
+   
+    This process is inherently nondeterministic, because of the randomized
+    choices made during both the initial population selection and subsequent
+    mutation of the best candidates.  Hence different plans may
+    be selected from one run to the next, resulting in varying run time
+    and varying output row order.
+   
+
+  
+
   
    Future Implementation Tasks for</div>
<div class="diff ctx">     <productname>PostgreSQL</> <acronym>GEQO</acronym>
@@ -257,6 +293,16 @@
       
      
 
+     
+      In the current implementation, the fitness of each candidate join
+      sequence is estimated by running the standard planner's join selection
+      and cost estimation code from scratch.  To the extent that different
+      candidates use similar sub-sequences of joins, a great deal of work
+      will be repeated.  This could be made significantly faster by retaining
+      cost estimates for sub-joins.  The problem is to avoid expending
+      unreasonable amounts of memory on retaining that state.
+     
+
      
       At a more basic level, it is not clear that solving query optimization
       with a GA algorithm designed for TSP is appropriate.  In the TSP case,
-     linkend="geqo" endterm="geqo-title">.
+     a reasonable (not necessarily optimal) query plan in a reasonable amount
+     of time, PostgreSQL uses a 
+     linkend="geqo" endterm="geqo-title"> when the number of joins
+     exceeds a threshold (see ).
     
    
 
@@ -380,20 +381,17 @@
      the index's operator class, another plan is created using
      the B-tree index to scan the relation. If there are further indexes
      present and the restrictions in the query happen to match a key of an
-     index further plans will be considered.
+     index, further plans will be considered.  Index scan plans are also
+     generated for indexes that have a sort ordering that can match the
+     query's ORDER BY clause (if any), or a sort ordering that
+     might be useful for merge joining (see below).
     
 
     
-     After all feasible plans have been found for scanning single relations,
-     plans for joining relations are created. The planner/optimizer
-     preferentially considers joins between any two relations for which there
-     exist a corresponding join clause in the WHERE qualification (i.e. for
-     which a restriction like where rel1.attr1=rel2.attr2
-     exists). Join pairs with no join clause are considered only when there
-     is no other choice, that is, a particular relation has no available
-     join clauses to any other relation. All possible plans are generated for
-     every join pair considered
-     by the planner/optimizer. The three possible join strategies are:
+     If the query requires joining two or more relations,
+     plans for joining relations are considered
+     after all feasible plans have been found for scanning single relations.
+     The three available join strategies are:
 
      
       
@@ -439,6 +437,26 @@
      cheapest one.
     
 
+    
+     If the query uses fewer than 
+     relations, a near-exhaustive search is conducted to find the best
+     join sequence.  The planner preferentially considers joins between any
+     two relations for which there exist a corresponding join clause in the
+     WHERE qualification (i.e. for
+     which a restriction like where rel1.attr1=rel2.attr2
+     exists). Join pairs with no join clause are considered only when there
+     is no other choice, that is, a particular relation has no available
+     join clauses to any other relation. All possible plans are generated for
+     every join pair considered by the planner, and the one that is
+     (estimated to be) the cheapest is chosen.
+    
+
+    
+     When geqo_threshold is exceeded, the join
+     sequences considered are determined by heuristics, as described
+     in .  Otherwise the process is the same.
+    
+
     
      The finished plan tree consists of sequential or index scans of
      the base relations, plus nested-loop, merge, or hash join nodes as
+     linkend="geqo" endterm="geqo-title"> when the number of joins
+     exceeds a threshold (see ).
      
     
  
@@ -380,20 +381,17 @@
       the index's operator class, another plan is created using
       the B-tree index to scan the relation. If there are further indexes
       present and the restrictions in the query happen to match a key of an
-     index further plans will be considered.
+     index, further plans will be considered.  Index scan plans are also
+     generated for indexes that have a sort ordering that can match the
+     query's ORDER BY clause (if any), or a sort ordering that
+     might be useful for merge joining (see below).
      
  
      
-     After all feasible plans have been found for scanning single relations,
-     plans for joining relations are created. The planner/optimizer
-     preferentially considers joins between any two relations for which there
-     exist a corresponding join clause in the WHERE qualification (i.e. for
-     which a restriction like where rel1.attr1=rel2.attr2
-     exists). Join pairs with no join clause are considered only when there
-     is no other choice, that is, a particular relation has no available
-     join clauses to any other relation. All possible plans are generated for
-     every join pair considered
-     by the planner/optimizer. The three possible join strategies are:
+     If the query requires joining two or more relations,
+     plans for joining relations are considered
+     after all feasible plans have been found for scanning single relations.
+     The three available join strategies are:
  
       
        
@@ -439,6 +437,26 @@
       cheapest one.
      
  
+    
+     If the query uses fewer than 
+     relations, a near-exhaustive search is conducted to find the best
+     join sequence.  The planner preferentially considers joins between any
+     two relations for which there exist a corresponding join clause in the
+     WHERE qualification (i.e. for
+     which a restriction like where rel1.attr1=rel2.attr2
+     exists). Join pairs with no join clause are considered only when there
+     is no other choice, that is, a particular relation has no available
+     join clauses to any other relation. All possible plans are generated for
+     every join pair considered by the planner, and the one that is
+     (estimated to be) the cheapest is chosen.
+    
+
+    
+     When geqo_threshold is exceeded, the join
+     sequences considered are determined by heuristics, as described
+     in .  Otherwise the process is the same.
+    
+
      
       The finished plan tree consists of sequential or index scans of
       the base relations, plus nested-loop, merge, or hash join nodes as
diff --git a/doc/src/sgml/geqo.sgml b/doc/src/sgml/geqo.sgml

index 6225dc4c3219ac87cbc8443bff86b44e11f42336..2f680762c13bb45c3b85bbba5c43011de112eb4b 100644 (file)
--- a/doc/src/sgml/geqo.sgml
+++ b/doc/src/sgml/geqo.sgml
@@ -1,4 +1,4 @@
-
+
  
   
    
@@ -186,11 +186,6 @@
      PostgreSQL optimizer.
     
  
-   
-    Parts of the GEQO module are adapted from D. Whitley's Genitor
-    algorithm.
-   
-
     
      Specific characteristics of the GEQO
      implementation in PostgreSQL
@@ -224,6 +219,11 @@
      
     
  
+   
+    Parts of the GEQO module are adapted from D. Whitley's
+    Genitor algorithm.
+   
+
     
      The GEQO module allows
      the PostgreSQL query optimizer to
@@ -231,6 +231,42 @@
      non-exhaustive search.
     
  
+  
+   Generating Possible Plans with <acronym>GEQO</acronym>
+
+   
+    The GEQO planning process uses the standard planner
+    code to generate plans for scans of individual relations.  Then join
+    plans are developed using the genetic approach.  As shown above, each
+    candidate join plan is represented by a sequence in which to join
+    the base relations.  In the initial stage, the GEQO
+    code simply generates some possible join sequences at random.  For each
+    join sequence considered, the standard planner code is invoked to
+    estimate the cost of performing the query using that join sequence.
+    (For each step of the join sequence, all three possible join strategies
+    are considered; and all the initially-determined relation scan plans
+    are available.  The estimated cost is the cheapest of these
+    possibilities.)  Join sequences with lower estimated cost are considered
+    more fit than those with higher cost.  The genetic algorithm
+    discards the least fit candidates.  Then new candidates are generated
+    by combining genes of more-fit candidates — that is, by using
+    randomly-chosen portions of known low-cost join sequences to create
+    new sequences for consideration.  This process is repeated until a
+    preset number of join sequences have been considered; then the best
+    one found at any time during the search is used to generate the finished
+    plan.
+   
+
+   
+    This process is inherently nondeterministic, because of the randomized
+    choices made during both the initial population selection and subsequent
+    mutation of the best candidates.  Hence different plans may
+    be selected from one run to the next, resulting in varying run time
+    and varying output row order.
+   
+
+  
+
    
     Future Implementation Tasks for</div>
<div class="diff ctx">     <productname>PostgreSQL</> <acronym>GEQO</acronym>
@@ -257,6 +293,16 @@
        
       
  
+     
+      In the current implementation, the fitness of each candidate join
+      sequence is estimated by running the standard planner's join selection
+      and cost estimation code from scratch.  To the extent that different
+      candidates use similar sub-sequences of joins, a great deal of work
+      will be repeated.  This could be made significantly faster by retaining
+      cost estimates for sub-joins.  The problem is to avoid expending
+      unreasonable amounts of memory on retaining that state.
+     
+
       
        At a more basic level, it is not clear that solving query optimization
        with a GA algorithm designed for TSP is appropriate.  In the TSP case,
author	Tom Lane
	Sat, 21 Jul 2007 04:02:41 +0000 (04:02 +0000)
committer	Tom Lane
	Sat, 21 Jul 2007 04:02:41 +0000 (04:02 +0000)
doc/src/sgml/arch-dev.sgml		patch \| blob \| blame \| history
doc/src/sgml/geqo.sgml		patch \| blob \| blame \| history