Make some more improvements to parallel query documentation.

author Robert Haas

Thu, 10 Aug 2017 17:22:31 +0000 (13:22 -0400)

committer Robert Haas

Thu, 10 Aug 2017 17:22:31 +0000 (13:22 -0400)
author Robert Haas
Thu, 10 Aug 2017 17:22:31 +0000 (13:22 -0400)
committer Robert Haas
Thu, 10 Aug 2017 17:22:31 +0000 (13:22 -0400)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml

index c33d6a03492d3fe640380292d677253b9dab1b41..2b6255ed95adabb1645e291079b7673831d6e8c0 100644 (file)
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2050,8 +2050,8 @@ include_dir 'conf.d'
         
          
           Sets the maximum number of workers that can be started by a single
-         Gather node.  Parallel workers are taken from the
-         pool of processes established by
+         Gather or Gather Merge node.
+         Parallel workers are taken from the pool of processes established by
           , limited by
           .  Note that the requested
           number of workers may not actually be available at run time.  If this
diff --git a/doc/src/sgml/parallel.sgml b/doc/src/sgml/parallel.sgml

index ff31e7537e6cc3b6a1c27d23fe49b03bca67c0c9..2a25f21eb4b8d083d4c168275827ae70c6dc089b 100644 (file)
--- a/doc/src/sgml/parallel.sgml
+++ b/doc/src/sgml/parallel.sgml
@@ -28,7 +28,8 @@
     
      When the optimizer determines that parallel query is the fastest execution
      strategy for a particular query, it will create a query plan which includes
-    a Gather node.  Here is a simple example:
+    a Gather or Gather Merge
+    node.  Here is a simple example:
  
  
  EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
@@ -43,15 +44,16 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
     
  
     
-    In all cases, the Gather node will have exactly one
+    In all cases, the Gather or
+    Gather Merge node will have exactly one
      child plan, which is the portion of the plan that will be executed in
-    parallel.  If the Gather node is at the very top of the plan
-    tree, then the entire query will execute in parallel.  If it is somewhere
-    else in the plan tree, then only the portion of the plan below it will run
-    in parallel.  In the example above, the query accesses only one table, so
-    there is only one plan node other than the Gather node itself;
-    since that plan node is a child of the Gather node, it will
-    run in parallel.
+    parallel.  If the Gather or Gather Merge node is
+    at the very top of the plan tree, then the entire query will execute in
+    parallel.  If it is somewhere else in the plan tree, then only the portion
+    of the plan below it will run in parallel.  In the example above, the
+    query accesses only one table, so there is only one plan node other than
+    the Gather node itself; since that plan node is a child of the
+    Gather node, it will run in parallel.
     
  
     
@@ -60,35 +62,47 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
      during query execution, the process which is implementing the user's
      session will request a number of background
      worker processes equal to the number
-    of workers chosen by the planner.  The total number of background
-    workers that can exist at any one time is limited by both
+    of workers chosen by the planner.  The number of background workers that
+    the planner will consider using is limited to at most
+    .  The total number
+    of background workers that can exist at any one time is limited by both
       and
-    , so it is possible for a
+    .  Therefore, it is possible for a
      parallel query to run with fewer workers than planned, or even with
      no workers at all.  The optimal plan may depend on the number of workers
      that are available, so this can result in poor query performance.  If this
-    occurrence is frequent, considering increasing
+    occurrence is frequent, consider increasing
      max_worker_processes and max_parallel_workers
      so that more workers can be run simultaneously or alternatively reducing
-    <xref linkend="guc-max-parallel-workers-per-gather"> so that the planner
+    <varname>max_parallel_workers_per_gather> so that the planner
      requests fewer workers.
     
  
     
      Every background worker process which is successfully started for a given
-    parallel query will execute the portion of the plan below
-    the Gather node.  The leader will also execute that portion
-    of the plan, but it has an additional responsibility: it must also read
-    all of the tuples generated by the workers.  When the parallel portion of
-    the plan generates only a small number of tuples, the leader will often
-    behave very much like an additional worker, speeding up query execution.
-    Conversely, when the parallel portion of the plan generates a large number
-    of tuples, the leader may be almost entirely occupied with reading the
-    tuples generated by the workers and performing any further processing
-    steps which are required by plan nodes above the level of the
-    Gather node.  In such cases, the leader will do very
-    little of the work of executing the parallel portion of the plan.
+    parallel query will execute the parallel portion of the plan.  The leader
+    will also execute that portion of the plan, but it has an additional
+    responsibility: it must also read all of the tuples generated by the
+    workers.  When the parallel portion of the plan generates only a small
+    number of tuples, the leader will often behave very much like an additional
+    worker, speeding up query execution.  Conversely, when the parallel portion
+    of the plan generates a large number of tuples, the leader may be almost
+    entirely occupied with reading the tuples generated by the workers and
+    performing any further processing steps which are required by plan nodes
+    above the level of the Gather node or
+    Gather Merge node.  In such cases, the leader will
+    do very little of the work of executing the parallel portion of the plan.
     
+
+   
+    When the node at the top of the parallel portion of the plan is
+    Gather Merge rather than Gather, it indicates that
+    each process executing the parallel portion of the plan is producing
+    tuples in sorted order, and that the leader is performing an
+    order-preserving merge.  In contrast, Gather reads tuples
+    from the workers in whatever order is convenient, destroying any sort
+    order that may have existed.
+      
   
  
   
@@ -221,9 +235,9 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
          send such a message, this can only occur when using a client that
          does not rely on libpq.  If this is a frequent
          occurrence, it may be a good idea to set
-         in sessions
-        where it is likely, so as to avoid generating query plans that may
-        be suboptimal when run serially.
+         to zero in
+        sessions where it is likely, so as to avoid generating query plans
+        that may be suboptimal when run serially.
        
      
  
@@ -262,6 +276,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
      so that each process which executes the plan will generate only a
      subset of the output rows in such a way that each required output row
      is guaranteed to be generated by exactly one of the cooperating processes.
+    Generally, this means that the scan on the driving table of the query
+    must be a parallel-aware scan.
    
  
   
@@ -302,9 +318,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
      
    
  
-    Only the scan types listed above may be used for a scan on the driving
-    table within a parallel plan.  Other scan types, such as parallel scans of
-    non-btree indexes, may be supported in the future.
+    Other scan types, such as scans of non-btree indexes, may support
+    parallel scans in the future.
    
   
  
@@ -343,10 +358,10 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
      the query performs an aggregation step, producing a partial result for
      each group of which that process is aware.  This is reflected in the plan
      as a Partial Aggregate node.  Second, the partial results are
-    transferred to the leader via the Gather node.  Finally, the
-    leader re-aggregates the results across all workers in order to produce
-    the final result.  This is reflected in the plan as a
-    Finalize Aggregate node.
+    transferred to the leader via Gather or Gather
+    Merge.  Finally, the leader re-aggregates the results across all
+    workers in order to produce the final result.  This is reflected in the
+    plan as a Finalize Aggregate node.
    
    
    
@@ -416,8 +431,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
      operation is one which cannot be performed in a parallel worker, but which
      can be performed in the leader while parallel query is in use.  Therefore,
      parallel restricted operations can never occur below a Gather
-    node, but can occur elsewhere in a plan which contains a
-    Gather node.  A parallel unsafe operation is one which cannot
+    or Gather Merge node, but can occur elsewhere in a plan which
+    contains such a node.  A parallel unsafe operation is one which cannot
      be performed while parallel query is in use, not even in the leader.
      When a query contains anything which is parallel unsafe, parallel query
      is completely disabled for that query.
@@ -449,7 +464,7 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
  
      
        
-        Access to an InitPlan or SubPlan.
+        Access to an InitPlan or correlated SubPlan.
        
      
    
@@ -514,8 +529,8 @@ EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
      parallel-restricted functions or aggregates involved in the query in
      order to obtain a superior plan.  So, for example, if a WHERE
      clause applied to a particular table is parallel restricted, the query
-    planner will not consider placing the scan of that table below a
-    Gather node.  In some cases, it would be
+    planner will not consider performing a scan of that table in the parallel
+    portion of a plan.  In some cases, it would be
      possible (and perhaps even efficient) to include the scan of that table in
      the parallel portion of the query and defer the evaluation of the
      WHERE clause so that it happens above the Gather
author	Robert Haas
	Thu, 10 Aug 2017 17:22:31 +0000 (13:22 -0400)
committer	Robert Haas
	Thu, 10 Aug 2017 17:22:31 +0000 (13:22 -0400)
doc/src/sgml/config.sgml		patch \| blob \| blame \| history
doc/src/sgml/parallel.sgml		patch \| blob \| blame \| history