docs: Improve window function docs

author Bruce Momjian

Tue, 11 Apr 2017 14:47:40 +0000 (10:47 -0400)

committer Bruce Momjian

Tue, 11 Apr 2017 14:47:47 +0000 (10:47 -0400)
author Bruce Momjian
Tue, 11 Apr 2017 14:47:40 +0000 (10:47 -0400)
committer Bruce Momjian
Tue, 11 Apr 2017 14:47:47 +0000 (10:47 -0400)
diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml

index 1e45511cc6b7bfd0657fa8f5e5887fd8ea505c04..7563131eba69d9aca274142297816681caaf0e7f 100644 (file)
--- a/doc/src/sgml/advanced.sgml
+++ b/doc/src/sgml/advanced.sgml
@@ -328,8 +328,8 @@ COMMIT;
      A window function performs a calculation across a set of
      table rows that are somehow related to the current row.  This is comparable
      to the type of calculation that can be done with an aggregate function.
-    But unlike regular aggregate functions, use of a window function does not
-    cause rows to become grouped into a single output row — the
+    However, window functions do not cause rows to become grouped into a single
+    output row like non-window aggregate calls would.  Instead, the
      rows retain their separate identities.  Behind the scenes, the window
      function is able to access more than just the current row of the query
      result.
@@ -363,20 +363,19 @@ SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM emps
      empsalary, and there is one output row for each row in the
      table.  The fourth column represents an average taken across all the table
      rows that have the same depname value as the current row.
-    (This actually is the same function as the regular avg
-    aggregate function, but the OVER clause causes it to be
-    treated as a window function and computed across an appropriate set of
-    rows.)
+    (This actually is the same function as the non-window avg
+    aggregate, but the OVER clause causes it to be
+    treated as a window function and computed across the window frame.)
     
  
     
      A window function call always contains an OVER clause
      directly following the window function's name and argument(s).  This is what
-    syntactically distinguishes it from a regular function or aggregate
-    function.  The OVER clause determines exactly how the
+    syntactically distinguishes it from a normal function or non-window
+    aggregate.  The OVER clause determines exactly how the
      rows of the query are split up for processing by the window function.
-    The PARTITION BY list within OVER specifies
-    dividing the rows into groups, or partitions, that share the same
+    The PARTITION BY clause within OVER
+    divides the rows into groups, or partitions, that share the same
      values of the PARTITION BY expression(s).  For each row,
      the window function is computed across the rows that fall into the
      same partition as the current row.
@@ -411,8 +410,8 @@ FROM empsalary;
  
  
      As shown here, the rank function produces a numerical rank
-    within the current row's partition for each distinct ORDER BY
-    value, in the order defined by the ORDER BY clause.
+    for each distinct ORDER BY value in the current row's
+    partition, using the order defined by the ORDER BY clause.
      rank needs no explicit parameter, because its behavior
      is entirely determined by the OVER clause.
     
@@ -424,20 +423,20 @@ FROM empsalary;
      if any.  For example, a row removed because it does not meet the
      WHERE condition is not seen by any window function.
      A query can contain multiple window functions that slice up the data
-    in different ways by means of different OVER clauses, but
+    in different ways using different OVER clauses, but
      they all act on the same collection of rows defined by this virtual table.
     
  
     
      We already saw that ORDER BY can be omitted if the ordering
      of rows is not important.  It is also possible to omit PARTITION
-    BY, in which case there is just one partition containing all the rows.
+    BY, in which case there is a single partition containing all rows.
     
  
     
      There is another important concept associated with window functions:
      for each row, there is a set of rows within its partition called its
-    window frame.  Many (but not all) window functions act only
+    window frame.  Some window functions act only
      on the rows of the window frame, rather than of the whole partition.
      By default, if ORDER BY is supplied then the frame consists of
      all rows from the start of the partition up through the current row, plus
@@ -515,7 +514,7 @@ SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;
      elsewhere, such as in GROUP BY, HAVING
      and WHERE clauses.  This is because they logically
      execute after the processing of those clauses.  Also, window functions
-    execute after regular aggregate functions.  This means it is valid to
+    execute after non-window aggregate functions.  This means it is valid to
      include an aggregate function call in the arguments of a window function,
      but not vice versa.
     
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml

index 6d7c7b8e0d0363fee5c3f11840479026d2b4ced8..adab3030c4170449897fa87d9fc43ae0dd2eb307 100644 (file)
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -13166,13 +13166,14 @@ NULL baz(3 rows)
  
    
     Aggregate functions compute a single result
-   from a set of input values.  The built-in normal aggregate functions
-   are listed in
-    and
-   .
-   The built-in ordered-set aggregate functions
-   are listed in  and
-   .  Grouping operations,
+   from a set of input values.  The built-in general-purpose aggregate
+   functions are listed in 
+   and statistical aggregates in 
+   linkend="functions-aggregate-statistics-table">.
+   The built-in within-group ordered-set aggregate functions
+   are listed in 
+   while the built-in within-group hypothetical-set ones are in 
+   linkend="functions-hypothetical-table">.  Grouping operations,
    which are closely related to aggregate functions, are listed in
    .
    The special syntax considerations for aggregate
@@ -14413,17 +14414,19 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
   
    The built-in window functions are listed in
    .  Note that these functions
-   must be invoked using window function syntax; that is an
+   must be invoked using window function syntax, i.e., an
    OVER clause is required.
   
 
   
-   In addition to these functions, any built-in or user-defined normal
-   aggregate function (but not ordered-set or hypothetical-set aggregates)
+   In addition to these functions, any built-in or user-defined
+   general-purpose or statistical
+   aggregate (i.e., not ordered-set or hypothetical-set aggregates)
    can be used as a window function; see
     for a list of the built-in aggregates.
    Aggregate functions act as window functions only when an OVER
-   clause follows the call; otherwise they act as regular aggregates.
+   clause follows the call; otherwise they act as non-window aggregates
+   and return a single row for the entire set.
   
 
   
@@ -14488,7 +14491,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
       
        double precision
       
-      relative rank of the current row: (rank - 1) / (total rows - 1)
+      relative rank of the current row: (rank - 1) / (total partition rows - 1)
      
 
      
@@ -14501,7 +14504,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
       
        double precision
       
-      relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)
+      cumulative distribution: (number of partition rows preceding or peer with current row) / total partition rows
      
 
      
@@ -14634,9 +14637,10 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
    All of the functions listed in
     depend on the sort ordering
    specified by the ORDER BY clause of the associated window
-   definition.  Rows that are not distinct in the ORDER BY
-   ordering are said to be peers; the four ranking functions
-   are defined so that they give the same answer for any two peer rows.
+   definition.  Rows that are not distinct when considering only the
+   ORDER BY columns are said to be peers.
+   The four ranking functions (including cume_dist) are
+   defined so that they give the same answer for all peer rows.
   
 
   
@@ -14679,6 +14683,14 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
    
   
 
+  
+   cume_dist computes the fraction of partition rows that
+   are less than or equal to the current row and its peers, while
+   percent_rank computes the fraction of partition rows that
+   are less than the current row, assuming the current row does not exist
+   in the partition.
+  
+
  
 
  


diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml

index 40f722c18de0b7d8db05e13a721146fb39f4eb6e..a2d136eaf81aede6ddba86105ab35766ee4a0f08 100644 (file)


--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1664,7 +1664,8 @@ SELECT string_agg(a ORDER BY a, ',') FROM table;  -- incorrect
    
     Placing ORDER BY within the aggregate's regular argument
     list, as described so far, is used when ordering the input rows for
-    a normal aggregate for which ordering is optional.  There is a
+    general-purpose and statistical aggregates, for which ordering is
+    optional.  There is a
     subclass of aggregate functions called ordered-set
     aggregates for which an order_by_clause
     is required, usually because the aggregate's computation is
@@ -1675,7 +1676,7 @@ SELECT string_agg(a ORDER BY a, ',') FROM table;  -- incorrect
     inside WITHIN GROUP (...), as shown in the final syntax
     alternative above.  The expressions in
     the order_by_clause are evaluated once per
-    input row just like normal aggregate arguments, sorted as per
+    input row just like regular aggregate arguments, sorted as per
     the order_by_clause's requirements, and fed
     to the aggregate function as input arguments.  (This is unlike the case
     for a non-WITHIN GROUP order_by_clause,
@@ -1683,7 +1684,7 @@ SELECT string_agg(a ORDER BY a, ',') FROM table;  -- incorrect
     argument expressions preceding WITHIN GROUP, if any, are
     called direct arguments to distinguish them from
     the aggregated arguments listed in
-    the order_by_clause.  Unlike normal aggregate
+    the order_by_clause.  Unlike regular aggregate
     arguments, direct arguments are evaluated only once per aggregate call,
     not once per input row.  This means that they can contain variables only
     if those variables are grouped by GROUP BY; this restriction
@@ -1779,10 +1780,10 @@ FROM generate_series(1,10) AS s(i);
    
     A window function call represents the application
     of an aggregate-like function over some portion of the rows selected
-    by a query.  Unlike regular aggregate function calls, this is not tied
+    by a query.  Unlike non-window aggregate calls, this is not tied
     to grouping of the selected rows into a single output row — each
     row remains separate in the query output.  However the window function
-    is able to scan all the rows that would be part of the current row's
+    has access to all the rows that would be part of the current row's
     group according to the grouping specification (PARTITION BY
     list) of the window function call.
     The syntax of a window function call is one of the following:
@@ -1831,20 +1832,20 @@ UNBOUNDED FOLLOWING
     named window in the WINDOW clause; see the
      reference page for details.  It's worth
     pointing out that OVER wname is not exactly equivalent to
-    OVER (wname); the latter implies copying and modifying the
+    OVER (wname ...); the latter implies copying and modifying the
     window definition, and will be rejected if the referenced window
     specification includes a frame clause.
    
 
    
-    The PARTITION BY option groups the rows of the query into
+    The PARTITION BY clause groups the rows of the query into
     partitions, which are processed separately by the window
     function.  PARTITION BY works similarly to a query-level
     GROUP BY clause, except that its expressions are always just
     expressions and cannot be output-column names or numbers.
     Without PARTITION BY, all rows produced by the query are
     treated as a single partition.
-    The ORDER BY option determines the order in which the rows
+    The ORDER BY clause determines the order in which the rows
     of a partition are processed by the window function.  It works similarly
     to a query-level ORDER BY clause, but likewise cannot use
     output-column names or numbers.  Without ORDER BY, rows are
@@ -1921,17 +1922,17 @@ UNBOUNDED FOLLOWING
    
     The built-in window functions are described in 
     linkend="functions-window-table">.  Other window functions can be added by
-    the user.  Also, any built-in or user-defined normal aggregate function
-    can be used as a window function.  Ordered-set aggregates presently
-    cannot be used as window functions, however.
+    the user.  Also, any built-in or user-defined general-purpose or
+    statistical aggregate can be used as a window function.  (Ordered-set
+    and hypothetical-set aggregates cannot presently be used as window functions.)
    
 
    
     The syntaxes using * are used for calling parameter-less
     aggregate functions as window functions, for example
     count(*) OVER (PARTITION BY x ORDER BY y).
-    The asterisk (*) is customarily not used for non-aggregate window functions.
-    Aggregate window functions, unlike normal aggregate functions, do not
+    The asterisk (*) is customarily not used for
+    window-specific functions.  Window-specific functions do not
     allow DISTINCT or ORDER BY to be used within the
     function argument list.
    
+   linkend="functions-aggregate-statistics-table">.
+   The built-in within-group ordered-set aggregate functions
+   are listed in 
+   while the built-in within-group hypothetical-set ones are in 
+   linkend="functions-hypothetical-table">.  Grouping operations,
    which are closely related to aggregate functions, are listed in
    .
    The special syntax considerations for aggregate
@@ -14413,17 +14414,19 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
   
    The built-in window functions are listed in
    .  Note that these functions
-   must be invoked using window function syntax; that is an
+   must be invoked using window function syntax, i.e., an
    OVER clause is required.
   
 
   
-   In addition to these functions, any built-in or user-defined normal
-   aggregate function (but not ordered-set or hypothetical-set aggregates)
+   In addition to these functions, any built-in or user-defined
+   general-purpose or statistical
+   aggregate (i.e., not ordered-set or hypothetical-set aggregates)
    can be used as a window function; see
     for a list of the built-in aggregates.
    Aggregate functions act as window functions only when an OVER
-   clause follows the call; otherwise they act as regular aggregates.
+   clause follows the call; otherwise they act as non-window aggregates
+   and return a single row for the entire set.
   
 
   
@@ -14488,7 +14491,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
       
        double precision
       
-      relative rank of the current row: (rank - 1) / (total rows - 1)
+      relative rank of the current row: (rank - 1) / (total partition rows - 1)
      
 
      
@@ -14501,7 +14504,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
       
        double precision
       
-      relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)
+      cumulative distribution: (number of partition rows preceding or peer with current row) / total partition rows
      
 
      
@@ -14634,9 +14637,10 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
    All of the functions listed in
     depend on the sort ordering
    specified by the ORDER BY clause of the associated window
-   definition.  Rows that are not distinct in the ORDER BY
-   ordering are said to be peers; the four ranking functions
-   are defined so that they give the same answer for any two peer rows.
+   definition.  Rows that are not distinct when considering only the
+   ORDER BY columns are said to be peers.
+   The four ranking functions (including cume_dist) are
+   defined so that they give the same answer for all peer rows.
   
 
   
@@ -14679,6 +14683,14 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
    
   
 
+  
+   cume_dist computes the fraction of partition rows that
+   are less than or equal to the current row and its peers, while
+   percent_rank computes the fraction of partition rows that
+   are less than the current row, assuming the current row does not exist
+   in the partition.
+  
+
  
 
  
+   linkend="functions-hypothetical-table">.  Grouping operations,
     which are closely related to aggregate functions, are listed in
     .
     The special syntax considerations for aggregate
@@ -14413,17 +14414,19 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
    
     The built-in window functions are listed in
     .  Note that these functions
-   must be invoked using window function syntax; that is an
+   must be invoked using window function syntax, i.e., an
     OVER clause is required.
    
  
    
-   In addition to these functions, any built-in or user-defined normal
-   aggregate function (but not ordered-set or hypothetical-set aggregates)
+   In addition to these functions, any built-in or user-defined
+   general-purpose or statistical
+   aggregate (i.e., not ordered-set or hypothetical-set aggregates)
     can be used as a window function; see
      for a list of the built-in aggregates.
     Aggregate functions act as window functions only when an OVER
-   clause follows the call; otherwise they act as regular aggregates.
+   clause follows the call; otherwise they act as non-window aggregates
+   and return a single row for the entire set.
    
  
    
@@ -14488,7 +14491,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
        
         double precision
        
-      relative rank of the current row: (rank - 1) / (total rows - 1)
+      relative rank of the current row: (rank - 1) / (total partition rows - 1)
       
  
       
@@ -14501,7 +14504,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
        
         double precision
        
-      relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)
+      cumulative distribution: (number of partition rows preceding or peer with current row) / total partition rows
       
  
       
@@ -14634,9 +14637,10 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
     All of the functions listed in
      depend on the sort ordering
     specified by the ORDER BY clause of the associated window
-   definition.  Rows that are not distinct in the ORDER BY
-   ordering are said to be peers; the four ranking functions
-   are defined so that they give the same answer for any two peer rows.
+   definition.  Rows that are not distinct when considering only the
+   ORDER BY columns are said to be peers.
+   The four ranking functions (including cume_dist) are
+   defined so that they give the same answer for all peer rows.
    
  
    
@@ -14679,6 +14683,14 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
     
    
  
+  
+   cume_dist computes the fraction of partition rows that
+   are less than or equal to the current row and its peers, while
+   percent_rank computes the fraction of partition rows that
+   are less than the current row, assuming the current row does not exist
+   in the partition.
+  
+
   
  
   
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml

index 40f722c18de0b7d8db05e13a721146fb39f4eb6e..a2d136eaf81aede6ddba86105ab35766ee4a0f08 100644 (file)
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1664,7 +1664,8 @@ SELECT string_agg(a ORDER BY a, ',') FROM table;  -- incorrect
     
      Placing ORDER BY within the aggregate's regular argument
      list, as described so far, is used when ordering the input rows for
-    a normal aggregate for which ordering is optional.  There is a
+    general-purpose and statistical aggregates, for which ordering is
+    optional.  There is a
      subclass of aggregate functions called ordered-set
      aggregates for which an order_by_clause
      is required, usually because the aggregate's computation is
@@ -1675,7 +1676,7 @@ SELECT string_agg(a ORDER BY a, ',') FROM table;  -- incorrect
      inside WITHIN GROUP (...), as shown in the final syntax
      alternative above.  The expressions in
      the order_by_clause are evaluated once per
-    input row just like normal aggregate arguments, sorted as per
+    input row just like regular aggregate arguments, sorted as per
      the order_by_clause's requirements, and fed
      to the aggregate function as input arguments.  (This is unlike the case
      for a non-WITHIN GROUP order_by_clause,
@@ -1683,7 +1684,7 @@ SELECT string_agg(a ORDER BY a, ',') FROM table;  -- incorrect
      argument expressions preceding WITHIN GROUP, if any, are
      called direct arguments to distinguish them from
      the aggregated arguments listed in
-    the order_by_clause.  Unlike normal aggregate
+    the order_by_clause.  Unlike regular aggregate
      arguments, direct arguments are evaluated only once per aggregate call,
      not once per input row.  This means that they can contain variables only
      if those variables are grouped by GROUP BY; this restriction
@@ -1779,10 +1780,10 @@ FROM generate_series(1,10) AS s(i);
     
      A window function call represents the application
      of an aggregate-like function over some portion of the rows selected
-    by a query.  Unlike regular aggregate function calls, this is not tied
+    by a query.  Unlike non-window aggregate calls, this is not tied
      to grouping of the selected rows into a single output row — each
      row remains separate in the query output.  However the window function
-    is able to scan all the rows that would be part of the current row's
+    has access to all the rows that would be part of the current row's
      group according to the grouping specification (PARTITION BY
      list) of the window function call.
      The syntax of a window function call is one of the following:
@@ -1831,20 +1832,20 @@ UNBOUNDED FOLLOWING
      named window in the WINDOW clause; see the
       reference page for details.  It's worth
      pointing out that OVER wname is not exactly equivalent to
-    OVER (wname); the latter implies copying and modifying the
+    OVER (wname ...); the latter implies copying and modifying the
      window definition, and will be rejected if the referenced window
      specification includes a frame clause.
     
  
     
-    The PARTITION BY option groups the rows of the query into
+    The PARTITION BY clause groups the rows of the query into
      partitions, which are processed separately by the window
      function.  PARTITION BY works similarly to a query-level
      GROUP BY clause, except that its expressions are always just
      expressions and cannot be output-column names or numbers.
      Without PARTITION BY, all rows produced by the query are
      treated as a single partition.
-    The ORDER BY option determines the order in which the rows
+    The ORDER BY clause determines the order in which the rows
      of a partition are processed by the window function.  It works similarly
      to a query-level ORDER BY clause, but likewise cannot use
      output-column names or numbers.  Without ORDER BY, rows are
@@ -1921,17 +1922,17 @@ UNBOUNDED FOLLOWING
     
      The built-in window functions are described in 
     linkend="functions-window-table">.  Other window functions can be added by
-    the user.  Also, any built-in or user-defined normal aggregate function
-    can be used as a window function.  Ordered-set aggregates presently
-    cannot be used as window functions, however.
+    the user.  Also, any built-in or user-defined general-purpose or
+    statistical aggregate can be used as a window function.  (Ordered-set
+    and hypothetical-set aggregates cannot presently be used as window functions.)
    
 
    
     The syntaxes using * are used for calling parameter-less
     aggregate functions as window functions, for example
     count(*) OVER (PARTITION BY x ORDER BY y).
-    The asterisk (*) is customarily not used for non-aggregate window functions.
-    Aggregate window functions, unlike normal aggregate functions, do not
+    The asterisk (*) is customarily not used for
+    window-specific functions.  Window-specific functions do not
     allow DISTINCT or ORDER BY to be used within the
     function argument list.
    
      linkend="functions-window-table">.  Other window functions can be added by
-    the user.  Also, any built-in or user-defined normal aggregate function
-    can be used as a window function.  Ordered-set aggregates presently
-    cannot be used as window functions, however.
+    the user.  Also, any built-in or user-defined general-purpose or
+    statistical aggregate can be used as a window function.  (Ordered-set
+    and hypothetical-set aggregates cannot presently be used as window functions.)
     
  
     
      The syntaxes using * are used for calling parameter-less
      aggregate functions as window functions, for example
      count(*) OVER (PARTITION BY x ORDER BY y).
-    The asterisk (*) is customarily not used for non-aggregate window functions.
-    Aggregate window functions, unlike normal aggregate functions, do not
+    The asterisk (*) is customarily not used for
+    window-specific functions.  Window-specific functions do not
      allow DISTINCT or ORDER BY to be used within the
      function argument list.
author	Bruce Momjian
	Tue, 11 Apr 2017 14:47:40 +0000 (10:47 -0400)
committer	Bruce Momjian
	Tue, 11 Apr 2017 14:47:47 +0000 (10:47 -0400)
doc/src/sgml/advanced.sgml		patch \| blob \| blame \| history
doc/src/sgml/func.sgml		patch \| blob \| blame \| history
doc/src/sgml/syntax.sgml		patch \| blob \| blame \| history