- linkend="xindex-strategies">). Depending on n, query may be
- different type.
+ Returns an array of keys given a value to be queried; that is,
+ query> is the value on the right-hand side of an
+ indexable operator whose left-hand side is the indexed column.
+ n> is the strategy number of the operator within the
+ operator class (see ).
+ Often, extractQuery> will need
+ to consult n> to determine the data type of
+ query> and the key values that need to be extracted.
+ The number of returned keys must be stored into *nkeys>.
bool consistent(bool check[], StrategyNumber n, Datum query)
- Returns TRUE if the indexed value satisfies the query qualifier with
- strategy n (or may satisfy in case of RECHECK mark in operator class).
- Each element of the check array is TRUE if the indexed value has a
- corresponding key in the query: if (check[i] == TRUE) the i-th key of
- the query is present in the indexed value.
+ Returns TRUE if the indexed value satisfies the query operator with
+ strategy number n> (or may satisfy, if the operator is
+ marked RECHECK in the operator class). The check> array has
+ the same length as the number of keys previously returned by
+ extractQuery> for this query. Each element of the
+ check> array is TRUE if the indexed value contains the
+ corresponding query key, ie, if (check[i] == TRUE) the i-th key of the
+ extractQuery> result array is present in the indexed value.
+ The original query> datum (not the extracted key array!) is
+ passed in case the consistent> method needs to consult it.
+
+
Implementation
+
+ Internally, a
GIN index contains a B-tree index
+ constructed over keys, where each key is an element of the indexed value
+ (a member of an array, for example) and where each tuple in a leaf page is
+ either a pointer to a B-tree over heap pointers (PT, posting tree), or a
+ list of heap pointers (PL, posting list) if the list is small enough.
+
+
+
+
GIN tips and tricks
Create vs insert
- In most cases, insertion into a
GIN index is slow
- due to the likelihood of many keys being inserted for each value.
- So, for bulk insertions into a table it is advisable to to drop the GIN
- index and recreate it after finishing bulk insertion.
-
+
In most cases, insertion into a
GIN index is slow
+ due to the likelihood of many keys being inserted for each value.
+ So, for bulk insertions into a table it is advisable to drop the GIN
+ index and recreate it after finishing bulk insertion.
+
- gin_fuzzy_search_limit
+
- The primary goal of developing
GIN indices was
- support for highly scalable, full-text search in
-
PostgreSQL and there are often situations when
- a full-text search returns a very large set of results. Since reading
- tuples from the disk and sorting them could take a lot of time, this is
- unacceptable for production. (Note that the index search itself is very
- fast.)
+ The primary goal of developing
GIN indexes was
+ to create support for highly scalable, full-text search in
+
PostgreSQL, and there are often situations when
+ a full-text search returns a very large set of results. Moreover, this
+ often happens when the query contains very frequent words, so that the
+ large result set is not even useful. Since reading many
+ tuples from the disk and sorting them could take a lot of time, this is
+ unacceptable for production. (Note that the index search itself is very
+ fast.)
+
+ To facilitate controlled execution of such queries
+
GIN has a configurable soft upper limit on the size
+ of the returned set, the
+ gin_fuzzy_search_limit configuration parameter.
+ It is set to 0 (meaning no limit) by default.
+ If a non-zero limit is set, then the returned set is a subset of
+ the whole result set, chosen at random.
+
+ Soft
means that the actual number of returned results
+ could differ slightly from the specified limit, depending on the query
+ and the quality of the system's random number generator.
- Such queries usually contain very frequent words, so the results are not
- very helpful. To facilitate execution of such queries
-
GIN has a configurable soft upper limit of the size
- of the returned set, determined by the
- gin_fuzzy_search_limit GUC variable. It is set to 0 by
- default (no limit).
-
- If a non-zero search limit is set, then the returned set is a subset of
- the whole result set, chosen at random.
-
- Soft
means that the actual number of returned results
- could slightly differ from the specified limit, depending on the query
- and the quality of the system's random number generator.
-
Limitations
-
GIN doesn't support full index scans
due to their
- extreme inefficiency: because there are often many keys per value,
- each heap pointer will be returned several times.
+
GIN doesn't support full index scans
: because there are
+ often many keys per value, each heap pointer would be returned many times,
+ and there is no easy way to prevent this.
When extractQuery returns zero keys,
-
GIN will emit an error: for different opclasses and
- strategies the semantic meaning of a void query may be different (for
- example, any array contains the void array, but they don't overlap the
- void array), and
GIN can't suggest a reasonable answer.
+
GIN will emit an error. Depending on the operator,
+ a void query might match all, some, or none of the indexed values (for
+ example, every array contains the empty array, but does not overlap the
+ empty array), and
GIN can't determine the correct
+ answer, nor produce a full-index-scan result if it could determine that
+ that was correct.
-
GIN searches keys only by equality matching. This may
+ It is not an error for extractValue> to return zero keys,
+ but in this case the indexed value will be unrepresented in the index.
+ This is another reason why full index scan is not useful — it would
+ miss such rows.
+
+
+
GIN searches keys only by equality matching. This may
be improved in future.
The
PostgreSQL source distribution includes
-
GIN classes for one-dimensional arrays of all internal
+
GIN classes for one-dimensional arrays of all internal
types. The following
contrib> modules also contain GIN
- operator classes:
+ operator classes:
-
+
intarray