Editorial improvements for GIN documentation.

author Tom Lane

Fri, 1 Dec 2006 23:46:46 +0000 (23:46 +0000)

committer Tom Lane

Fri, 1 Dec 2006 23:46:46 +0000 (23:46 +0000)
author Tom Lane
Fri, 1 Dec 2006 23:46:46 +0000 (23:46 +0000)
committer Tom Lane
Fri, 1 Dec 2006 23:46:46 +0000 (23:46 +0000)
diff --git a/doc/src/sgml/gin.sgml b/doc/src/sgml/gin.sgml

index 24517402bccec22a7c9f5b98d722812239320141..c2bbad42bfaa2ba7960e91643503981b5fbce0c9 100644 (file)
--- a/doc/src/sgml/gin.sgml
+++ b/doc/src/sgml/gin.sgml
@@ -1,4 +1,4 @@
-
+
  
  
  GIN Indexes
@@ -14,8 +14,9 @@
   
     GIN stands for Generalized Inverted Index.  It is
     an index structure storing a set of (key, posting list) pairs, where
-   'posting list' is a set of rows in which the key occurs. Each
-   row may contain many keys.
+   a posting list is a set of rows in which the key occurs. Each
+   indexed value may contain many keys, so the same row ID may appear in
+   multiple posting lists.
   
  
   
@@ -45,7 +46,7 @@
  
   
     The GIN interface has a high level of abstraction,
-   requiring the access method implementer to only implement the semantics of
+   requiring the access method implementer only to implement the semantics of
     the data type being accessed.  The GIN layer itself
     takes care of concurrency, logging and searching the tree structure.
   
@@ -53,26 +54,14 @@
   
     All it takes to get a GIN access method working
     is to implement four user-defined methods, which define the behavior of
-   keys in the tree. In short, GIN combines extensibility
-   along with generality, code reuse, and a clean interface.
- 
-
-
-
-
- Implementation
-
- 
-  Internally, GIN consists of a B-tree index constructed 
-  over keys, where each key is an element of the indexed value 
-  (element of array, for example) and where each tuple in a leaf page is 
-  either a pointer to a B-tree over heap pointers (PT, posting tree), or a 
-  list of heap pointers (PL, posting list) if the tuple is small enough.
+   keys in the tree and the relationships between keys, indexed values,
+   and indexable queries. In short, GIN combines
+   extensibility with generality, code reuse, and a clean interface.
   
  
   
-   There are four methods that an index operator class for
-   GIN must provide (prototypes are in pseudocode):
+   The four methods that an index operator class for
+   GIN must provide are:
   
  
   
@@ -80,9 +69,9 @@
       int compare(Datum a, Datum b)
       
        
-      Compares keys (not indexed values!) and returns an integer less than 
-      zero, zero, or greater than zero, indicating whether the first key is 
-      less than, equal to, or greater than the second.
+       Compares keys (not indexed values!) and returns an integer less than
+       zero, zero, or greater than zero, indicating whether the first key is
+       less than, equal to, or greater than the second.
        
       
      
@@ -91,21 +80,26 @@
       Datum* extractValue(Datum inputValue, uint32 *nkeys)
       
        
-      Returns an array of keys of value to be indexed, nkeys should
-      contain the number of returned keys.
+       Returns an array of keys given a value to be indexed.  The
+       number of returned keys must be stored into *nkeys.
        
       
      
  
      
-     Datum* extractQuery(Datum query, uint32 nkeys, 
-       StrategyNumber n)
+     Datum* extractQuery(Datum query, uint32 *nkeys,
+        StrategyNumber n)
       
        
-      Returns an array of keys of the query to be executed. n contains the
-      strategy number of the operation (see 
-      linkend="xindex-strategies">).  Depending on n, query may be
-      different type.
+       Returns an array of keys given a value to be queried; that is,
+       query is the value on the right-hand side of an
+       indexable operator whose left-hand side is the indexed column.
+       n is the strategy number of the operator within the
+       operator class (see ).
+       Often, extractQuery will need
+       to consult n to determine the data type of
+       query and the key values that need to be extracted.
+       The number of returned keys must be stored into *nkeys.
       
      
     
@@ -114,11 +108,16 @@
      bool consistent(bool check[], StrategyNumber n, Datum query)
      
       
-      Returns TRUE if the indexed value satisfies the query qualifier with 
-      strategy n (or may satisfy in case of RECHECK mark in operator class). 
-      Each element of the check array is TRUE if the indexed value has a 
-      corresponding key in the query: if (check[i] == TRUE) the i-th key of 
-      the query is present in the indexed value.
+       Returns TRUE if the indexed value satisfies the query operator with
+       strategy number n (or may satisfy, if the operator is
+       marked RECHECK in the operator class).  The check array has
+       the same length as the number of keys previously returned by
+       extractQuery for this query.  Each element of the
+       check array is TRUE if the indexed value contains the
+       corresponding query key, ie, if (check[i] == TRUE) the i-th key of the
+       extractQuery result array is present in the indexed value.
+       The original query datum (not the extracted key array!) is
+       passed in case the consistent method needs to consult it.
       
      
     
@@ -127,6 +126,19 @@
 
 
 
+
+ Implementation
+
+ 
+  Internally, a GIN index contains a B-tree index
+  constructed over keys, where each key is an element of the indexed value
+  (a member of an array, for example) and where each tuple in a leaf page is
+  either a pointer to a B-tree over heap pointers (PT, posting tree), or a
+  list of heap pointers (PL, posting list) if the list is small enough.
+ 
+
+
+
 
 GIN tips and tricks
 
@@ -134,44 +146,43 @@
   
    Create vs insert
    
-   
-    In most cases, insertion into a GIN index is slow
-    due to the likelihood of many keys being inserted for each value.
-    So, for bulk insertions into a table it is advisable to to drop the GIN 
-    index and recreate it after finishing bulk insertion.
-   
+    
+     In most cases, insertion into a GIN index is slow
+     due to the likelihood of many keys being inserted for each value.
+     So, for bulk insertions into a table it is advisable to drop the GIN
+     index and recreate it after finishing bulk insertion.
+    
    
   
 
   
-   gin_fuzzy_search_limit
+   
    
-   
-    The primary goal of developing GIN indices was 
-    support for highly scalable, full-text search in 
-    PostgreSQL and there are often situations when 
-    a full-text search returns a very large set of results.  Since reading 
-    tuples from the disk and sorting them could take a lot of time, this is 
-    unacceptable for production.  (Note that the index search itself is very 
-    fast.) 
+    
+     The primary goal of developing GIN indexes was
+     to create support for highly scalable, full-text search in
+     PostgreSQL, and there are often situations when
+     a full-text search returns a very large set of results.  Moreover, this
+     often happens when the query contains very frequent words, so that the
+     large result set is not even useful.  Since reading many
+     tuples from the disk and sorting them could take a lot of time, this is
+     unacceptable for production.  (Note that the index search itself is very
+     fast.)
+    
+    
+     To facilitate controlled execution of such queries
+     GIN has a configurable soft upper limit on the size
+     of the returned set, the
+     gin_fuzzy_search_limit configuration parameter.
+     It is set to 0 (meaning no limit) by default.
+     If a non-zero limit is set, then the returned set is a subset of
+     the whole result set, chosen at random.
+    
+    
+     Soft means that the actual number of returned results
+     could differ slightly from the specified limit, depending on the query
+     and the quality of the system's random number generator.
     
-   
-    Such queries usually contain very frequent words, so the results are not 
-    very helpful. To facilitate execution of such queries 
-    GIN has a configurable soft upper limit of the size 
-    of the returned set, determined by the 
-    gin_fuzzy_search_limit GUC variable.  It is set to 0 by
-    default (no limit).
-   
-   
-    If a non-zero search limit is set, then the returned set is a subset of 
-    the whole result set, chosen at random.
-   
-   
-    Soft means that the actual number of returned results
-    could slightly differ from the specified limit, depending on the query
-    and the quality of the system's random number generator.
-   
    
   
  
@@ -182,21 +193,30 @@
  Limitations
 
  
-  GIN doesn't support full index scans due to their
-  extreme inefficiency: because there are often many keys per value,
-  each heap pointer will be returned several times.
+  GIN doesn't support full index scans: because there are
+  often many keys per value, each heap pointer would be returned many times,
+  and there is no easy way to prevent this.
  
 
  
   When extractQuery returns zero keys,
-  GIN will emit an error: for different opclasses and
-  strategies the semantic meaning of a void query may be different (for
-  example, any array contains the void array, but they don't overlap the
-  void array), and GIN can't suggest a reasonable answer.
+  GIN will emit an error.  Depending on the operator,
+  a void query might match all, some, or none of the indexed values (for
+  example, every array contains the empty array, but does not overlap the
+  empty array), and GIN can't determine the correct
+  answer, nor produce a full-index-scan result if it could determine that
+  that was correct.
  
 
  
-  GIN searches keys only by equality matching.  This may 
+  It is not an error for extractValue to return zero keys,
+  but in this case the indexed value will be unrepresented in the index.
+  This is another reason why full index scan is not useful — it would
+  miss such rows.
+ 
+
+ 
+  GIN searches keys only by equality matching.  This may
   be improved in future.
  
 
@@ -206,12 +226,12 @@
 
  
   The PostgreSQL source distribution includes
-  GIN classes for one-dimensional arrays of all internal 
+  GIN classes for one-dimensional arrays of all internal
   types.  The following
   contrib modules also contain GIN
-  operator classes: 
+  operator classes:
  
- 
+
  
   
    intarray
-      linkend="xindex-strategies">).  Depending on n, query may be
-      different type.
+       Returns an array of keys given a value to be queried; that is,
+       query is the value on the right-hand side of an
+       indexable operator whose left-hand side is the indexed column.
+       n is the strategy number of the operator within the
+       operator class (see ).
+       Often, extractQuery will need
+       to consult n to determine the data type of
+       query and the key values that need to be extracted.
+       The number of returned keys must be stored into *nkeys.
        
       
      
@@ -114,11 +108,16 @@
       bool consistent(bool check[], StrategyNumber n, Datum query)
       
        
-      Returns TRUE if the indexed value satisfies the query qualifier with 
-      strategy n (or may satisfy in case of RECHECK mark in operator class). 
-      Each element of the check array is TRUE if the indexed value has a 
-      corresponding key in the query: if (check[i] == TRUE) the i-th key of 
-      the query is present in the indexed value.
+       Returns TRUE if the indexed value satisfies the query operator with
+       strategy number n (or may satisfy, if the operator is
+       marked RECHECK in the operator class).  The check array has
+       the same length as the number of keys previously returned by
+       extractQuery for this query.  Each element of the
+       check array is TRUE if the indexed value contains the
+       corresponding query key, ie, if (check[i] == TRUE) the i-th key of the
+       extractQuery result array is present in the indexed value.
+       The original query datum (not the extracted key array!) is
+       passed in case the consistent method needs to consult it.
        
       
      
@@ -127,6 +126,19 @@
  
  
  
+
+ Implementation
+
+ 
+  Internally, a GIN index contains a B-tree index
+  constructed over keys, where each key is an element of the indexed value
+  (a member of an array, for example) and where each tuple in a leaf page is
+  either a pointer to a B-tree over heap pointers (PT, posting tree), or a
+  list of heap pointers (PL, posting list) if the list is small enough.
+ 
+
+
+
  
  GIN tips and tricks
  
@@ -134,44 +146,43 @@
    
     Create vs insert
     
-   
-    In most cases, insertion into a GIN index is slow
-    due to the likelihood of many keys being inserted for each value.
-    So, for bulk insertions into a table it is advisable to to drop the GIN 
-    index and recreate it after finishing bulk insertion.
-   
+    
+     In most cases, insertion into a GIN index is slow
+     due to the likelihood of many keys being inserted for each value.
+     So, for bulk insertions into a table it is advisable to drop the GIN
+     index and recreate it after finishing bulk insertion.
+    
     
    
  
    
-   gin_fuzzy_search_limit
+   
     
-   
-    The primary goal of developing GIN indices was 
-    support for highly scalable, full-text search in 
-    PostgreSQL and there are often situations when 
-    a full-text search returns a very large set of results.  Since reading 
-    tuples from the disk and sorting them could take a lot of time, this is 
-    unacceptable for production.  (Note that the index search itself is very 
-    fast.) 
+    
+     The primary goal of developing GIN indexes was
+     to create support for highly scalable, full-text search in
+     PostgreSQL, and there are often situations when
+     a full-text search returns a very large set of results.  Moreover, this
+     often happens when the query contains very frequent words, so that the
+     large result set is not even useful.  Since reading many
+     tuples from the disk and sorting them could take a lot of time, this is
+     unacceptable for production.  (Note that the index search itself is very
+     fast.)
+    
+    
+     To facilitate controlled execution of such queries
+     GIN has a configurable soft upper limit on the size
+     of the returned set, the
+     gin_fuzzy_search_limit configuration parameter.
+     It is set to 0 (meaning no limit) by default.
+     If a non-zero limit is set, then the returned set is a subset of
+     the whole result set, chosen at random.
+    
+    
+     Soft means that the actual number of returned results
+     could differ slightly from the specified limit, depending on the query
+     and the quality of the system's random number generator.
      
-   
-    Such queries usually contain very frequent words, so the results are not 
-    very helpful. To facilitate execution of such queries 
-    GIN has a configurable soft upper limit of the size 
-    of the returned set, determined by the 
-    gin_fuzzy_search_limit GUC variable.  It is set to 0 by
-    default (no limit).
-   
-   
-    If a non-zero search limit is set, then the returned set is a subset of 
-    the whole result set, chosen at random.
-   
-   
-    Soft means that the actual number of returned results
-    could slightly differ from the specified limit, depending on the query
-    and the quality of the system's random number generator.
-   
     
    
   
@@ -182,21 +193,30 @@
   Limitations
  
   
-  GIN doesn't support full index scans due to their
-  extreme inefficiency: because there are often many keys per value,
-  each heap pointer will be returned several times.
+  GIN doesn't support full index scans: because there are
+  often many keys per value, each heap pointer would be returned many times,
+  and there is no easy way to prevent this.
   
  
   
    When extractQuery returns zero keys,
-  GIN will emit an error: for different opclasses and
-  strategies the semantic meaning of a void query may be different (for
-  example, any array contains the void array, but they don't overlap the
-  void array), and GIN can't suggest a reasonable answer.
+  GIN will emit an error.  Depending on the operator,
+  a void query might match all, some, or none of the indexed values (for
+  example, every array contains the empty array, but does not overlap the
+  empty array), and GIN can't determine the correct
+  answer, nor produce a full-index-scan result if it could determine that
+  that was correct.
   
  
   
-  GIN searches keys only by equality matching.  This may 
+  It is not an error for extractValue to return zero keys,
+  but in this case the indexed value will be unrepresented in the index.
+  This is another reason why full index scan is not useful — it would
+  miss such rows.
+ 
+
+ 
+  GIN searches keys only by equality matching.  This may
    be improved in future.
   
  
@@ -206,12 +226,12 @@
  
   
    The PostgreSQL source distribution includes
-  GIN classes for one-dimensional arrays of all internal 
+  GIN classes for one-dimensional arrays of all internal
    types.  The following
    contrib modules also contain GIN
-  operator classes: 
+  operator classes:
   
- 
+
   
    
     intarray
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml

index fd8bb7251edb3015906ec8c912ddaa998d31c2ac..1edceebd2d5f7619b7890aa13493d2491a747b94 100644 (file)
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1,4 +1,4 @@
-
+
  
  
   Indexes
@@ -116,7 +116,7 @@ CREATE INDEX test1_id_index ON test1 (id);
  
    
     PostgreSQL provides several index types:
-   B-tree, Hash, GIN and GiST.  Each index type uses a different
+   B-tree, Hash, GiST and GIN.  Each index type uses a different
     algorithm that is best suited to different types of queries.
     By default, the CREATE INDEX command will create a
     B-tree index, which fits the most common situations.
@@ -247,8 +247,8 @@ CREATE INDEX name ON table
      GIN
      index
     
-   GIN is a inverted index and it's usable for values which have more
-   than one key, arrays for example. Like GiST, GIN may support
+   GIN indexes are inverted indexes which can handle values that contain more
+   than one key, arrays for example. Like GiST, GIN can support
     many different user-defined indexing strategies and the particular 
     operators with which a GIN index can be used vary depending on the 
     indexing strategy.  
@@ -267,7 +267,8 @@ CREATE INDEX name ON table
     (See  for the meaning of
     these operators.)
     Other GIN operator classes are available in the contrib
-   tsearch2 and intarray modules. For more information see .
+   tsearch2 and intarray modules.
+   For more information see .
    
   
  
diff --git a/doc/src/sgml/xindex.sgml b/doc/src/sgml/xindex.sgml

index 444839399e1ebb122a38d8dae7de272c8b5abdb0..a66dd3c4ee402020820cbf5d415c22b7d0a0af09 100644 (file)
--- a/doc/src/sgml/xindex.sgml
+++ b/doc/src/sgml/xindex.sgml
@@ -1,4 +1,4 @@
-
+
  
  
   Interfacing Extensions To Indexes
@@ -243,15 +243,16 @@
     
  
    
-   GIN indexes are similar to GiST's in flexibility: they don't have a fixed
-   et of strategies. Instead, the consistency support routine
-   interprets the strategy numbers accordingly with operator class
-   definition. As an example, strategies of operator class over arrays
-   is shown in .
+   GIN indexes are similar to GiST indexes in flexibility: they don't have a
+   fixed set of strategies. Instead the support routines of each operator
+   class interpret the strategy numbers according to the operator class's
+   definition. As an example, the strategy numbers used by the built-in
+   operator classes for arrays are
+   shown in .
    
  
     
-    GIN Array<span class="marked">'s</span> Strategies
+    GIN Array Strategies
      
       
        
@@ -388,36 +389,35 @@
       
        
         consistent - determine whether key satisfies the 
-       query qualifier
+        query qualifier
         1
        
        
-       union - compute union of of a set of given keys
+       union - compute union of a set of keys
         2
        
        
-       compress - computes a compressed representation of a key or value
-       to be indexed
+       compress - compute a compressed representation of a key or value
+        to be indexed
         3
        
        
-       decompress - computes a decompressed representation of a 
-      compressed key 
+       decompress - compute a decompressed representation of a 
+        compressed key
         4
        
        
         penalty - compute penalty for inserting new key into subtree 
-      with given subtree's key
+       with given subtree's key
         5
        
        
         picksplit - determine which entries of a page are to be moved
-      to the new page and compute the union keys for resulting pages 
+       to the new page and compute the union keys for resulting pages
         6
        
        
-       equal - compare two keys and returns true if they are equal 
-       
+       equal - compare two keys and return true if they are equal
         7
        
       
@@ -441,23 +441,22 @@
       
        
         
-   compare - Compare two keys and return an integer less than zero, zero, or
-   greater than zero, indicating whether the first key is less than, equal to,
-   or greater than the second.
-      
+        compare - compare two keys and return an integer less than zero, zero,
+        or greater than zero, indicating whether the first key is less than,
+        equal to, or greater than the second
+       
         1
        
        
-       extractValue - extract keys from value to be indexed
+       extractValue - extract keys from a value to be indexed
         2
        
        
-       extractQuery - extract keys from query
+       extractQuery - extract keys from a query condition
         3
        
        
-       consistent - determine whether value matches by the
-              query
+       consistent - determine whether value matches query condition
         4
        
       
@@ -822,12 +821,16 @@ CREATE OPERATOR CLASS polygon_ops
          STORAGE box;
  
  
-   At present, only the GiST and GIN index method supports a
+   At present, only the GiST and GIN index methods support a
     STORAGE type that's different from the column data type.
-   The GiST <literal>compress and >decompress support
+   The GiST <function>compress and >decompress support
     routines must deal with data-type conversion when STORAGE
-   is used. Functions named extractValue and extractQuery
-   do conversation into internally used types for GIN.
+   is used.  In GIN, the STORAGE type identifies the type of
+   the key values, which normally is different from the type
+   of the indexed column — for example, an operator class for
+   integer array columns might have keys that are just integers.  The
+   GIN extractValue and extractQuery support
+   routines are responsible for extracting keys from indexed values.
author	Tom Lane
	Fri, 1 Dec 2006 23:46:46 +0000 (23:46 +0000)
committer	Tom Lane
	Fri, 1 Dec 2006 23:46:46 +0000 (23:46 +0000)
doc/src/sgml/gin.sgml		patch \| blob \| blame \| history
doc/src/sgml/indices.sgml		patch \| blob \| blame \| history
doc/src/sgml/xindex.sgml		patch \| blob \| blame \| history