-
+
Indexes
To remove an index, use the DROP INDEX command.
- Indexes can be added and removed from tables at any time.
+ Indexes can be added to and removed from tables at any time.
-
Multi-Column Indexes
+
Multicolumn Indexes
- multi-column
+ multicolumn
- Currently, only the B-tree implementation supports multi-column
+ Currently, only the B-tree implementation supports multicolumn
indexes. Up to 16 columns may be specified. (This limit can be
altered when building
Postgres; see the
file pg_config.h.)
- The query optimizer can use a multi-column index for queries that
+ The query optimizer can use a multicolumn index for queries that
involve the first
n consecutive columns in
the index (when used with appropriate operators), up to the total
number of columns specified in the index definition. For example,
- Multi-column indexes can only be used if the clauses involving the
+ Multicolumn indexes can only be used if the clauses involving the
indexed columns are joined with AND. For instance,
SELECT name FROM test2 WHERE major = constant OR minor = constant;
- Multi-column indexes should be used sparingly. Most of the time,
+ Multicolumn indexes should be used sparingly. Most of the time,
an index on a single column is sufficient and saves space and time.
Indexes with more than three columns are almost certainly
inappropriate.
PostgreSQL automatically creates unique
indexes when a table is declared with a unique constraint or a
primary key, on the columns that make up the primary key or unique
- columns (a multi-column index, if appropriate), to enforce that
+ columns (a multicolumn index, if appropriate), to enforce that
constraint. A unique index can be added to a table at any later
- time, to add a unique constraint. (But a primary key cannot be
- added after table creation.)
+ time, to add a unique constraint.
+
+
+ The preferred way to add a unique constraint to a table is
+ ALTER TABLE ... ADD CONSTRAINT. The use of
+ indexes to enforce unique constraints could be considered an
+ implementation detail that should not be accessed directly.
+
+
argument, but they must be table columns, not constants.
Functional indexes are always single-column (namely, the function
result) even if the function uses more than one input field; there
- cannot be multi-column indexes that contain function calls.
+ cannot be multicolumn indexes that contain function calls.
- ">
- >Partial Indexes
+ ">
+ >Partial Indexes
- partial-index">
+ indexes-partial">
partial
-
-
Author
- This is from a reply to a question on the email list
- on 1998-08-11.
-
-
-
+ A partial index is an index built over a
+ subset of a table; the subset is defined by a conditional
+ expression (called the predicate of the
+ partial index).
+
+
+ A major motivation for partial indexes is to avoid indexing common
+ values. Since a query conditionalized on a common value will not
+ use the index anyway, there is no point in keeping those rows in the
+ index at all. This reduces the size of the index, which will speed
+ up queries that do use the index. It will also speed up many data
+ manipulation operations because the index does not need to be
+ updated in all cases. shows a
+ possible application of this idea.
+
+
+
+
Setting up a Partial Index to Exclude Common Values
+
+ Suppose you are storing web server access logs in a database.
+ Most accesses originate from the IP range of your organization but
+ some are from elsewhere (say, employees on dial-up connections).
+ So you do not want to index the IP range that corresponds to your
+ organization's subnet.
+
+
+ Assume a table like this:
+CREATE TABLE access_log (
+ url varchar,
+ client_ip inet,
+ ...
+);
+
+
+
+ To create a partial index that suits our example, use a command
+ such as this:
+CREATE INDEX access_log_client_ip_ix ON access_log (client_ip)
+ WHERE NOT (client_ip > inet '192.168.100.0' AND client_ip < inet '192.168.100.255');
+
+
- A partial index
- is an index built over a subset of a table; the subset is defined by
- supports partial indexes with arbitrary
- predicates. I believe IBM's
DB2
- for AS/400 supports partial indexes
- using single-clause predicates.
+ A typical query that can use this index would be:
+SELECT * FROM access_log WHERE url = '/index.html' AND client_ip = inet '212.78.10.32';
+
+ A query that cannot use this index is:
+SELECT * FROM access_log WHERE client_ip = inet '192.168.100.23';
+
- The main motivation for partial indexes is this:
- if all of the queries you ask that can
- profitably use an index fall into a certain range, why build an index
- over the whole table and suffer the associated space/time costs?
+ Observe that this kind of partial index requires that the common
+ values be actively tracked. If the distribution of values is
+ inherent (due to the nature of the application) and static (does
+ not change), this is not difficult, but if the common values are
+ merely due to the coincidental data load this can require a lot of
+ maintenance work.
+
+
+
+ Another possibility is to exclude values from the index that the
+ typical query workload is not interested in; this is shown in
+ linkend="indexes-partial-ex2">. This results in the same
+ advantages as listed above, but it prevents the
+ uninteresting
values from being accessed via an
+ index at all, even if an index scan might be profitable in that
+ case. Obviously, setting up partial indexes for this kind of
+ scenario will require a lot of care and experimentation.
+
- (There are other reasons too; see
- for details.)
+
+
Setting up a Partial Index to Exclude Uninteresting Values
+
+ If you have a table that contains both billed and unbilled orders
+ where the unbilled orders take up a small fraction of the total
+ table and yet that is an often used section, you can improve
+ performance by creating an index on just that portion. The
+ command the create the index would look like this:
+CREATE INDEX orders_unbilled_index ON orders (order_nr)
+ WHERE billed is not true;
+
- The machinery to build, update and query partial indexes isn't too
- bad. The hairy parts are index selection (which indexes do I build?)
- and query optimization (which indexes do I use?); i.e., the parts
- that involve deciding what predicate(s) match the workload/query in
- some useful way. For those who are into database theory, the problems
- are basically analogous to the corresponding materialized view
- problems, albeit with different cost parameters and formulas. These
- are, in the general case, hard problems for the standard ordinal
- types; they're super-hard problems with black-box extension types,
- because the selectivity estimation technology is so crude.
+ A possible query to use this index would be
+SELECT * FROM orders WHERE billed is not true AND order_nr < 10000;
+
+ However, the index can also be used in queries that do not involve
+ order_nr> at all, e.g.,
+SELECT * FROM orders WHERE billed is not true AND amount > 5000.00;
+
+ This is not as efficient as a partial index on the
+ amount> column would be, since the system has to
+ scan the entire index in any case.
- Check ,
- ,
- and
-
- for more information.
+ Note that this query cannot use this index:
+SELECT * FROM orders WHERE order_nr = 3501;
+
+ The order 3501 may be among the billed or among the unbilled
+ orders.
-
-
+
+
+ also illustrates that the
+ indexed column and the column used in the predicate do not need to
+ match.
PostgreSQL supports partial
+ indexes with arbitrary predicates, as long as only columns of the
+ table being indexed are involved. However, keep in mind that the
+ predicate must actually match the condition used in the query that
+ is supposed to benefit from the index.
+
PostgreSQL does not have a sophisticated
+ theorem prover that can recognize mathematically equivalent
+ predicates that are written in different forms. (Not
+ only is such a general theorem prover extremely difficult to
+ create, it would probably be too slow to be of any real use.)
+
+
+ Finally, a partial index can also be used to override the system's
+ query plan choices. It may occur that data sets with peculiar
+ distributions will cause the system to use an index when it really
+ should not. In that case the index can be set up so that it is not
+ available for the offending query. Normally,
+
PostgreSQL> makes reasonable choices about index
+ usage (e.g., it avoids them when retrieving common values, so the
+ earlier example really only saves index size, it is not required to
+ avoid index usage), and grossly incorrect plan choices are cause
+ for a bug report.
+
+
+ Keep in mind that setting up a partial index indicates that you
+ know at least as much as the query planner knows, in particular you
+ know when an index might be profitable. Forming this knowledge
+ requires experience and understanding of how indexes in
+
PostgreSQL> work. In most cases, the advantage of a
+ partial index over a regular index will not be much.
+
+
+ More information about partial indexes can be found in
+ endterm="STON89b" linkend="STON89b-full">,
+ linkend="OLSON93-full">, and
+ linkend="SESHADRI95-full">.
+
+
+
+
+
Examining Index Usage
+
+ Although indexes in
PostgreSQL> do not need
+ maintenance and tuning, it is still important that it is checked
+ which indexes are actually used by the real-life query workload.
+ Examining index usage is done with the EXPLAIN> command;
+ its application for this purpose is illustrated in
+ linkend="using-explain">.
+
+
+ It is difficult to formulate a general procedure for determining
+ which indexes to set up. There are a number of typical cases that
+ have been shown in the examples throughout the previous sections.
+ A good deal of experimentation will be necessary in most cases.
+ The rest of this section gives some tips for that.
+
+
+
+
+ Always run ANALYZE first. This command
+ collects statistics about the distribution of the values in the
+ table. This information is required to guess the number of rows
+ returned by a query, which is needed by the planner to assign
+ realistic costs to each possible query plan. In absence of any
+ real statistics, some default values are assumed, which are
+ almost certain to be inaccurate. Examining an application's
+ index usage without having run ANALYZE is
+ therefore a lost cause.
+
+
+
+
+ Use real data for experimentation. Using test data for setting
+ up indexes will tell you what indexes you need for the test data,
+ but that is all.
+
+
+ It is especially fatal to use proportionally reduced data sets.
+ While selecting 1000 out of 100000 rows could be a candidate for
+ an index, selecting 1 out of 100 rows will hardly be, because the
+ 100 rows will probably fit within a single disk page, and there
+ is no plan that can beat sequentially fetching 1 disk page.
+
+
+ Also be careful when making up test data, which is often
+ unavoidable when the application is not in production use yet.
+ Values that are very similar, completely random, or inserted in
+ sorted order will skew the statistics away from the distribution
+ that real data would have.
+
+
+
+
+ When indexes are not used, it can be useful for testing to force
+ their use. There are run-time parameters that can turn off
+ various plan types (described in the Administrator's
+ Guide). For instance, turning off sequential scans
+ (enable_seqscan>) and nested-loop joins
+ (enable_nestloop>), which are the most basic plans,
+ will force the system to use a different plan. If the system
+ still chooses a sequential scan or nested-loop join then there is
+ probably a more fundamental problem for why the index is not
+ used, for example, the query condition does not match the index.
+ (What kind of query can use what kind of index is explained in
+ the previous sections.)
+
+
+
+
+ If forcing index usage does use the index, then there are two
+ possibilities: Either the system is right and using the index is
+ indeed not appropriate, or the cost estimates of the query plans
+ are not reflecting reality. So you should time your query with
+ and without indexes. The EXPLAIN ANALYZE
+ command can be useful here.
+
+
+
+
+ If it turns out that the cost estimates are wrong, there are,
+ again, two possibilities. The total cost is computed from the
+ per-row costs of each plan node times the selectivity estimate of
+ the plan node. The costs of the plan nodes can be tuned with
+ run-time parameters (described in the Administrator's
+ Guide). An inaccurate selectivity estimate is due to
+ insufficient statistics. It may be possible to help this by
+ tuning the statistics-gathering parameters (see ALTER
+ TABLE reference).
+
+
+ If you do not succeed in adjusting the costs to be more
+ appropriate, then you may have to do with forcing index usage
+ explicitly. You may also want to contact the
+
PostgreSQL> developers to examine the issue.
+
+
+
+
+