+ linkend="using-explain"> and to show some
+ additional details about how the planner uses the
+ system statistics to estimate the number of rows each part of a query might
+ return. This is a significant part of the planning process,
providing much of the raw material for cost calculation.
- The intent of this chapter is not to document the code —
- better done in the code itself, but to present an overview of how it works.
- This will perhaps ease the learning curve for someone who subsequently
- wishes to read the code. As a consequence, the approach chosen is to analyze
- a series of incrementally more complex examples.
-
-
- The outputs and algorithms shown below are taken from version 8.0.
- The behavior of earlier (or later) versions might vary.
+ The intent of this chapter is not to document the code in detail,
+ but to present an overview of how it works.
+ This will perhaps ease the learning curve for someone who subsequently
+ wishes to read the code.
- Using examples drawn from the regression test database, let's start with a
- very simple query:
+ The examples shown below use tables in the
PostgreSQL>
+ regression test database.
+ The outputs shown are taken from version 8.3.
+ The behavior of earlier (or later) versions might vary.
+ Note also that since ANALYZE> uses random sampling
+ while producing statistics, the results will change slightly after
+ any new ANALYZE>.
+
+
+ Let's start with a very simple query:
+
EXPLAIN SELECT * FROM tenk1;
QUERY PLAN
-------------------------------------------------------------
- Seq Scan on tenk1 (cost=0.00..445.00 rows=10000 width=244)
+ Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)
-
- How the planner determines the cardinality of <classname>tenk1name>
- is covered in using-explain">, but is repeated here for
- completeness. The number of rows is looked up from
- <classname>pg_classname>:
+
+ How the planner determines the cardinality of <structname>tenk1name>
+ is covered in planner-stats">, but is repeated here for
+ completeness. The number of pages and rows is looked up in
+ <structname>pg_classname>:
-SELECT reltuples, relpages FROM pg_class WHERE relname = 'tenk1';
+SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1';
relpages | reltuples
----------+-----------
- 345 | 10000
-
- The planner will check the relpages
- estimate (this is a cheap operation) and if incorrect might scale
- reltuples to obtain a row estimate. In this
- case it does not, thus:
-
-rows = 10000
+ 358 | 10000
+ These numbers are current as of the last VACUUM> or
+ ANALYZE> on the table. The planner then fetches the
+ actual current number of pages in the table (this is a cheap operation,
+ not requiring a table scan). If that is different from
+ relpages then
+ reltuples is scaled accordingly to
+ arrive at a current number-of-rows estimate. In this case the values
+ are correct so the rows estimate is the same as
+ reltuples.
-
+
- let's move on to an example with a range condition in its
+ Let's move on to an example with a range condition in its
WHERE clause:
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1000;
- QUERY PLAN
-------------------------------------------------------------
- Seq Scan on tenk1 (cost=0.00..470.00 rows=1031 width=244)
- Filter: (unique1 < 1000)
-
-
- The planner examines the WHERE clause condition:
-
-unique1 < 1000
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Bitmap Heap Scan on tenk1 (cost=24.06..394.64 rows=1007 width=244)
+ Recheck Cond: (unique1 < 1000)
+ -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..23.80 rows=1007 width=0)
+ Index Cond: (unique1 < 1000)
- and looks up the restriction function for the operator
- < in pg_operator.
- This is held in the column oprrest,
- and the result in this case is scalarltsel.
- The scalarltsel function retrieves the histogram for
- unique1 from pg_statistics
- - we can follow this by using the simpler pg_stats
+ The planner examines the WHERE clause condition
+ and looks up the selectivity function for the operator
+ < in pg_operator.
+ This is held in the column oprrest,
+ and the entry in this case is scalarltsel.
+ The scalarltsel function retrieves the histogram for
+ unique1 from
+ pg_statistics. For manual queries it is more
+ convenient to look in the simpler pg_stats
view:
-SELECT histogram_bounds FROM pg_stats
+SELECT histogram_bounds FROM pg_stats
WHERE tablename='tenk1' AND attname='unique1';
histogram_bounds
------------------------------------------------------
- {1,970,1943,2958,3971,5069,6028,7007,7919,8982,9995}
+ {0,993,1997,3050,4040,5036,5957,7057,8029,9016,9995}
- Next the fraction of the histogram occupied by < 1000
- is worked out. This is the selectivity. The histogram divides the range
- into equal frequency buckets, so all we have to do is locate the bucket
- that our value is in and count part of it and
- all of the ones before. The value 1000 is clearly in
- the second (970 - 1943) bucket, so by assuming a linear distribution of
- values inside each bucket we can calculate the selectivity as:
+ Next the fraction of the histogram occupied by < 1000
+ is worked out. This is the selectivity. The histogram divides the range
+ into equal frequency buckets, so all we have to do is locate the bucket
+ that our value is in and count part of it and
+ all of the ones before. The value 1000 is clearly in
+ the second bucket (993-1997). Assuming a linear distribution of
+ values inside each bucket, we can calculate the selectivity as:
-selectivity = (1 + (1000 - bckt[2].min)/(bckt[2].max - bckt[2].min))/num_bckts
- = (1 + (1000 - 970)/(1943 - 970))/10
- = 0.1031
+selectivity = (1 + (1000 - bucket[2].min)/(bucket[2].max - bucket[2].min))/num_buckets
+ = (1 + (1000 - 993)/(1997 - 993))/10
+ = 0.100697
that is, one whole bucket plus a linear fraction of the second, divided by
the number of buckets. The estimated number of rows can now be calculated as
- the product of the selectivity and the cardinality of
- <classname>tenk1name>:
+ the product of the selectivity and the cardinality of
+ <structname>tenk1name>:
rows = rel_cardinality * selectivity
- = 10000 * 0.1031
- = 1031
+ = 10000 * 0.100697
+ = 1007 (rounding off)
-
- Next let's consider an example with equality condition in its
+ Next let's consider an example with an equality condition in its
WHERE clause:
-EXPLAIN SELECT * FROM tenk1 WHERE stringu1 = 'ATAAAA';
+EXPLAIN SELECT * FROM tenk1 WHERE stringu1 = 'CRAAAA';
QUERY PLAN
----------------------------------------------------------
- Seq Scan on tenk1 (cost=0.00..470.00 rows=31 width=244)
- Filter: (stringu1 = 'ATAAAA'::name)
+ Seq Scan on tenk1 (cost=0.00..483.00 rows=30 width=244)
+ Filter: (stringu1 = 'CRAAAA'::name)
- Again the planner examines the WHERE clause condition:
+ Again the planner examines the WHERE clause condition
+ and looks up the selectivity function for =, which is
+ eqsel. For equality estimation the histogram is
+ not useful; instead the list of most
+ common values> (
MCVs) is used to determine the
+ selectivity. Let's have a look at the MCVs, with some additional columns
+ that will be useful later:
-stringu1 = 'ATAAAA'
-
-
- and looks up the restriction function for =, which is
- eqsel. This case is a bit different, as the most
- common values —
MCVs, are used to determine the
- selectivity. Let's have a look at these, with some extra columns that will
- be useful later:
-
-SELECT null_frac, n_distinct, most_common_vals, most_common_freqs FROM pg_stats
+SELECT null_frac, n_distinct, most_common_vals, most_common_freqs FROM pg_stats
WHERE tablename='tenk1' AND attname='stringu1';
null_frac | 0
-n_distinct | 672
-most_common_vals | {FDAAAA,NHAAAA,ATAAAA,BGAAAA,EBAAAA,MOAAAA,NDAAAA,OWAAAA,BHAAAA,BJAAAA}
-most_common_freqs | {0.00333333,0.00333333,0.003,0.003,0.003,0.003,0.003,0.003,0.00266667,0.00266667}
+n_distinct | 676
+most_common_vals | {EJAAAA,BBAAAA,CRAAAA,FCAAAA,FEAAAA,GSAAAA,JOAAAA,MCAAAA,NAAAAA,WGAAAA}
+most_common_freqs | {0.00333333,0.003,0.003,0.003,0.003,0.003,0.003,0.003,0.003,0.003}
+
- The selectivity is merely the most common frequency (
MCF)
- corresponding to the third
MCV — 'ATAAAA':
+ Since CRAAAA> appears in the list of MCVs, the selectivity is
+ merely the corresponding entry in the list of most common frequencies
selectivity = mcf[3]
= 0.003
- The estimated number of rows is just the product of this with the
- cardinality of <classname>tenk1 as before:
+ As before, the estimated number of rows is just the product of this with the
+ cardinality of <structname>tenk1:
rows = 10000 * 0.003
= 30
-
- The number displayed by EXPLAIN is one more than this,
- due to some post estimation checks.
- Now consider the same query, but with a constant that is not in the
+ Now consider the same query, but with a constant that is not in the
QUERY PLAN
----------------------------------------------------------
- Seq Scan on tenk1 (cost=0.00..470.00 rows=15 width=244)
+ Seq Scan on tenk1 (cost=0.00..483.00 rows=15 width=244)
Filter: (stringu1 = 'xxx'::name)
- This is quite a different problem, how to estimate the selectivity when the
- value is
not in the
MCV list.
- The approach is to use the fact that the value is not in the list,
+ This is quite a different problem: how to estimate the selectivity when the
+ value is
not in the
MCV list.
+ The approach is to use the fact that the value is not in the list,
combined with the knowledge of the frequencies for all of the
selectivity = (1 - sum(mvf))/(num_distinct - num_mcv)
- = (1 - (0.00333333 + 0.00333333 + 0.003 + 0.003 + 0.003
- + 0.003 + 0.003 + 0.003 + 0.00266667 + 0.00266667))/(672 - 10)
- = 0.001465
+ = (1 - (0.00333333 + 0.003 + 0.003 + 0.003 + 0.003 + 0.003 +
+ 0.003 + 0.003 + 0.003 + 0.003))/(676 - 10)
+ = 0.0014559
- That is, add up all the frequencies for the
MCVs and
- subtract them from one — because it is not one
- of these, and divide by the remaining distinct values.
- Notice that there are no null values so we don't have to worry about those.
- The estimated number of rows is calculated as usual:
+ That is, add up all the frequencies for the
MCVs and
+ subtract them from one, then
+ divide by the number of other distinct values.
+ This amounts to assuming that the fraction of the column that is not any
+ of the MCVs is evenly distributed among all the other distinct values.
+ Notice that there are no null values so we don't have to worry about those
+ (otherwise we'd subtract the null fraction from the numerator as well).
+ The estimated number of rows is then calculated as usual:
-rows = 10000 * 0.001465
- = 15
+rows = 10000 * 0.0014559
+ = 15 (rounding off)
-
- Let's increase the complexity to consider a case with more than one
- condition in the WHERE clause:
+ The previous example with unique1 < 1000> was an
+ oversimplification of what scalarltsel really does;
+ now that we have seen an example of the use of MCVs, we can fill in some
+ more detail. The example was correct as far as it went, because since
+ unique1> is a unique column it has no MCVs (obviously, no
+ value is any more common than any other value). For a non-unique
+ column, there will normally be both a histogram and an MCV list, and
+ the histogram does not include the portion of the column
+ population represented by the MCVs>. We do things this way because
+ it allows more precise estimation. In this situation
+ scalarltsel directly applies the condition (e.g.,
+ < 1000>) to each value of the MCV list, and adds up the
+ frequencies of the MCVs for which the condition is true. This gives
+ an exact estimate of the selectivity within the portion of the table
+ that is MCVs. The histogram is then used in the same way as above
+ to estimate the selectivity in the portion of the table that is not
+ MCVs, and then the two numbers are combined to estimate the overall
+ selectivity. For example, consider
-EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1000 AND stringu1 = 'xxx';
+EXPLAIN SELECT * FROM tenk1 WHERE stringu1 < 'IAAAAA';
- QUERY PLAN
------------------------------------------------------------
- Seq Scan on tenk1 (cost=0.00..495.00 rows=2 width=244)
- Filter: ((unique1 < 1000) AND (stringu1 = 'xxx'::name))
+ QUERY PLAN
+------------------------------------------------------------
+ Seq Scan on tenk1 (cost=0.00..483.00 rows=3077 width=244)
+ Filter: (stringu1 < 'IAAAAA'::name)
- An assumption of independence is made and the selectivities of the
- individual restrictions are multiplied together:
+ We already saw the MCV information for stringu1>,
+ and here is its histogram:
-selectivity = selectivity(unique1 < 1000) * selectivity(stringu1 = 'xxx')
- = 0.1031 * 0.001465
- = 0.00015104
+SELECT histogram_bounds FROM pg_stats
+WHERE tablename='tenk1' AND attname='stringu1';
+
+ histogram_bounds
+--------------------------------------------------------------------------------
+ {AAAAAA,CQAAAA,FRAAAA,IBAAAA,KRAAAA,NFAAAA,PSAAAA,SGAAAA,VAAAAA,XLAAAA,ZZAAAA}
- The row estimates are calculated as before:
+ Checking the MCV list, we find that the condition stringu1 <
+ 'IAAAAA'> is satisfied by the first six entries and not the last four,
+ so the selectivity within the MCV part of the population is
-rows = 10000 * 0.00015104
- = 2
+selectivity = sum(relevant mvfs)
+ = 0.00333333 + 0.003 + 0.003 + 0.003 + 0.003 + 0.003
+ = 0.01833333
+
+ Summing all the MCFs also tells us that the total fraction of the
+ population represented by MCVs is 0.03033333, and therefore the
+ fraction represented by the histogram is 0.96966667 (again, there
+ are no nulls, else we'd have to exclude them here). We can see
+ that the value IAAAAA> falls nearly at the end of the
+ third histogram bucket. Using some rather cheesy assumptions
+ about the frequency of different characters, the planner arrives
+ at the estimate 0.298387 for the portion of the histogram population
+ that is less than IAAAAA>. We then combine the estimates
+ for the MCV and non-MCV populations:
+
+selectivity = mcv_selectivity + histogram_selectivity * histogram_fraction
+ = 0.01833333 + 0.298387 * 0.96966667
+ = 0.307669
+
+rows = 10000 * 0.307669
+ = 3077 (rounding off)
+
+
+ In this particular example, the correction from the MCV list is fairly
+ small, because the column distribution is actually quite flat (the
+ statistics showing these particular values as being more common than
+ others are mostly due to sampling error). In a more typical case where
+ some values are significantly more common than others, this complicated
+ process gives a useful improvement in accuracy because the selectivity
+ for the most common values is found exactly.
-
+
- Finally we will examine a query that includes a JOIN
- together with a WHERE clause:
+ Now let's consider a case with more than one
+ condition in the WHERE clause:
-EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2
-WHERE t1.unique1 < 50 AND t1.unique2 = t2.unique2;
+EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1000 AND stringu1 = 'xxx';
- QUERY PLAN
------------------------------------------------------------------------------------------
- Nested Loop (cost=0.00..346.90 rows=51 width=488)
- -> Index Scan using tenk1_unique1 on tenk1 t1 (cost=0.00..192.57 rows=51 width=244)
- Index Cond: (unique1 < 50)
- -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.00..3.01 rows=1 width=244)
- Index Cond: ("outer".unique2 = t2.unique2)
+ QUERY PLAN
+--------------------------------------------------------------------------------
+ Bitmap Heap Scan on tenk1 (cost=23.80..396.91 rows=1 width=244)
+ Recheck Cond: (unique1 < 1000)
+ Filter: (stringu1 = 'xxx'::name)
+ -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..23.80 rows=1007 width=0)
+ Index Cond: (unique1 < 1000)
- The restriction on tenk1
- unique1 < 50
is evaluated before the nested-loop join.
- This is handled analogously to the previous range example. The restriction
- operator for < is scalarlteqsel
- as before, but this time the value 50 is in the first bucket of the
- unique1 histogram:
+ The planner assumes that the two conditions are independent, so that
+ the individual selectivities of the clauses can be multiplied together:
+
+selectivity = selectivity(unique1 < 1000) * selectivity(stringu1 = 'xxx')
+ = 0.100697 * 0.0014559
+ = 0.0001466
+
+rows = 10000 * 0.0001466
+ = 1 (rounding off)
+
+
+ Notice that the number of rows estimated to be returned from the bitmap
+ index scan reflects only the condition used with the index; this is
+ important since it affects the cost estimate for the subsequent heap
+ fetches.
+
+
+ Finally we will examine a query that involves a join:
-selectivity = (0 + (50 - bckt[1].min)/(bckt[1].max - bckt[1].min))/num_bckts
- = (0 + (50 - 1)/(970 - 1))/10
- = 0.005057
+EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2
+WHERE t1.unique1 < 50 AND t1.unique2 = t2.unique2;
-rows = 10000 * 0.005057
- = 51
+ QUERY PLAN
+--------------------------------------------------------------------------------------
+ Nested Loop (cost=4.64..456.23 rows=50 width=488)
+ -> Bitmap Heap Scan on tenk1 t1 (cost=4.64..142.17 rows=50 width=244)
+ Recheck Cond: (unique1 < 50)
+ -> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.63 rows=50 width=0)
+ Index Cond: (unique1 < 50)
+ -> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.00..6.27 rows=1 width=244)
+ Index Cond: (t2.unique2 = t1.unique2)
- The restriction for the join is:
+ The restriction on tenk1,
+ unique1 < 50,
+ is evaluated before the nested-loop join.
+ This is handled analogously to the previous range example. This time the
+ value 50 falls into the first bucket of the
+ unique1 histogram:
-t2.unique2 = t1.unique2
+selectivity = (0 + (50 - bucket[1].min)/(bucket[1].max - bucket[1].min))/num_buckets
+ = (0 + (50 - 0)/(993 - 0))/10
+ = 0.005035
+
+rows = 10000 * 0.005035
+ = 50 (rounding off)
- This is due to the join method being nested-loop, with
- tenk1 being in the outer loop. The operator is just
- our familiar =, however the restriction function is
- obtained from the oprjoin column of
- <classname>pg_operator - and is eqjoinsel.
- Additionally we use the statistical information for both
- <classname>tenk2 and tenk1name>:
+ The restriction for the join is t2.unique2 = t1.unique2>.
+ The operator is just
+ our familiar =, however the selectivity function is
+ obtained from the oprjoin column of
+ <structname>pg_operator, and is eqjoinsel.
+ eqjoinsel looks up the statistical information for both
+ <structname>tenk2 and tenk1name>:
-SELECT tablename, null_frac,n_distinct, most_common_vals FROM pg_stats
-WHERE tablename IN ('tenk1', 'tenk2') AND attname='unique2';
+SELECT tablename, null_frac,n_distinct, most_common_vals FROM pg_stats
+WHERE tablename IN ('tenk1', 'tenk2') AND attname='unique2';
tablename | null_frac | n_distinct | most_common_vals
-----------+-----------+------------+------------------
tenk2 | 0 | -1 |
- In this case there is no
MCV information for
- unique2 because all the values appear to be
- unique, so we can use an algorithm that relies only on the number of
+ In this case there is no
MCV information for
+ unique2 because all the values appear to be
+ unique, so we use an algorithm that relies only on the number of
distinct values for both relations together with their null fractions:
selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1, 1/num_distinct2)
- = (1 - 0) * (1 - 0) * min(1/10000, 1/1000)
+ = (1 - 0) * (1 - 0) / max(10000, 10000)
= 0.0001
- This is, subtract the null fraction from one for each of the relations,
- and divide by the maximum of the two distinct values. The number of rows
- that the join is likely to emit is calculated as the cardinality of
- Cartesian product of the two nodes in the nested-loop, multiplied by the
+ This is, subtract the null fraction from one for each of the relations,
+ and divide by the maximum of the numbers of distinct values.
+ The number of rows
+ that the join is likely to emit is calculated as the cardinality of the
+ Cartesian product of the two inputs, multiplied by the
selectivity:
rows = (outer_cardinality * inner_cardinality) * selectivity
- = (51 * 10000) * 0.0001
- = 51
+ = (50 * 10000) * 0.0001
+ = 50
- For those interested in further details, estimation of the number of rows in
- a relation is covered in
- src/backend/optimizer/util/plancat.c. The calculation
- logic for clause selectivities is in
- src/backend/optimizer/path/clausesel.c. The actual
- implementations of the operator and join restriction functions can be found
+ Had there been MCV lists for the two columns,
+ eqjoinsel would have used direct comparison of the MCV
+ lists to determine the join selectivity within the part of the column
+ populations represented by the MCVs. The estimate for the remainder of the
+ populations follows the same approach shown here.
+
+
+ Notice that we showed inner_cardinality> as 10000, that is,
+ the unmodified size of tenk2>. It might appear from
+ inspection of the EXPLAIN> output that the estimate of
+ join rows comes from 50 * 1, that is, the number of outer rows times
+ the estimated number of rows obtained by each inner indexscan on
+ tenk2>. But this is not the case: the join relation size
+ is estimated before any particular join plan has been considered. If
+ everything is working well then the two ways of estimating the join
+ size will produce about the same answer, but due to roundoff error and
+ other factors they sometimes diverge significantly.
+
+
+ For those interested in further details, estimation of the size of
+ a table (before any WHERE> clauses) is done in
+ src/backend/optimizer/util/plancat.c. The generic
+ logic for clause selectivities is in
+ src/backend/optimizer/path/clausesel.c. The
+ operator-specific selectivity functions are mostly found
in src/backend/utils/adt/selfuncs.c.
-