-
+
GiST Indexes
The
PostgreSQL source distribution includes
several examples of index methods implemented using
-
GiST. The core system currently provides R-Tree
- equivalent functionality for some of the built-in geometric data types
+
GiST. The core system currently provides text search
+ support (indexing for tsvector> and tsquery>) as well as
+ R-Tree equivalent functionality for some of the built-in geometric data types
(see src/backend/access/gist/gistproc.c>). The following
contrib> modules also contain GiST
operator classes:
+
+ hstore
+
+
Module for storing (key, value) pairs
+
+
+
intarray
Indexing for float ranges
-
-
- tsearch2
-
-
-
-
+
Indexes
(See for the meaning of
these operators.)
- Also, an IS NULL> condition on
- an index column can be used with a GiST index.
Many other GiST operator
classes are available in the contrib> collection or as separate
projects. For more information see .
(See for the meaning of
these operators.)
- GIN indexes cannot use IS NULL> as a search condition.
- Other GIN operator classes are available in the contrib>
- tsearch2 and intarray modules.
- For more information see .
+ Many other GIN operator
+ classes are available in the contrib> collection or as separate
+ projects. For more information see .
-
Tsearch2 Integration
+
Text Search Integration
Trigram matching is a very useful tool when used in conjunction
- with a text index created by the Tsearch2 contrib module. (See
- contrib/tsearch2)
+ with a full text index.
The first step is to generate an auxiliary table containing all
- the unique words in the Tsearch2 index:
+ the unique words in the documents:
CREATE TABLE words AS SELECT word FROM
stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');
- Where 'documents' is a table that has a text field 'bodytext'
- that TSearch2 is used to search. The use of the 'simple' dictionary
- with the to_tsvector function, instead of just using the already
+ where documents> is a table that has a text field
+ bodytext> that we wish to search. The use of the
+ simple> configuration with the to_tsvector>
+ function, instead of just using the already
existing vector is to avoid creating a list of already stemmed
words. This way, only the original, unstemmed words are added
to the word list.
- Since the 'words' table has been generated as a separate,
+ Since the words> table has been generated as a separate,
static table, it will need to be periodically regenerated so that
- it remains up to date with the word list in the Tsearch2 index.
+ it remains up to date with the document collection.
References
- Tsearch2 Development Site
-
GiST Development Site
+ Tsearch2 Development Site
+
* User-defined opclasses. (The scheme is similar to GiST.)
* Optimized index creation (Makes use of maintenance_work_mem to accumulate
postings in memory.)
- * Tsearch2 support via an opclass
+ * Text search support via an opclass
* Soft upper limit on the returned results set using a GUC variable:
gin_fuzzy_search_limit