+
id="xindex-btree-table">
B-tree Strategies
B-tree
-
- The idea is that you'll need to add operators corresponding to the
- comparisons above to the pg_amopname> relation (see below).
+ The idea is that you'll need to add operators corresponding to these strategies
+ to the pg_amopname> relation (see below).
The access method code can use these strategy numbers, regardless of data
- type, to figure out how to partition the
B-tree,
+ type, to figure out how to partition the B-tree,
compute selectivity, and so on. Don't worry about the details of adding
operators yet; just understand that there must be a set of these
- operators for <filename>int2, int4, oid, and every other
- data type
on which a B-tree can operate.
+ operators for <type>int2>, int4>, oid>, and all other
+ data types on which a B-tree can operate.
+
+
+
+
Access Method Support Routines
Sometimes, strategies aren't enough information for the system to figure
out how to use an index. Some access methods require additional support
- routines in order to work. For example, the
B-tree
+ routines in order to work. For example, the B-tree
access method must be able to compare two keys and determine whether one
is greater than, equal to, or less than the other. Similarly, the
-
R-tree access method must be able to compute
+ R-tree access method must be able to compute
intersections, unions, and sizes of rectangles. These
operations do not correspond to operators used in qualifications in
SQL queries; they are administrative routines used by
In order to manage diverse support routines consistently across all
PostgreSQL access methods,
- pg_am includes a column called
- amsupport. This column records the number of
- support routines used by an access method. For
B-trees,
- this number is one -- the routine to take two keys and return -1, 0, or
- +1, depending on whether the first key is less than, equal
- to, or greater than the second.
-
-
- Strictly speaking, this routine can return a negative
- number (< 0), zero, or a non-zero positive number (> 0).
-
-
+ pg_am includes a column called
+ amsupport. This column records the
+ number of support routines used by an access method. For B-trees,
+ this number is one: the routine to take two keys and return -1, 0,
+ or +1, depending on whether the first key is less than, equal to,
+ or greater than the second. (Strictly speaking, this routine can
+ return a negative number (< 0), zero, or a non-zero positive
+ number (> 0).)
- The amstrategies entry in pg_am
- is just the number
- of strategies defined for the access method in question. The operators
- for less than, less equal, and so on don't appear in
- pg_am. Similarly, amsupport
- is just the number of support routines required by the access
- method. The actual routines are listed elsewhere.
+ The amstrategies entry in
+ pg_am is just the number of strategies
+ defined for the access method in question. The operators for less
+ than, less equal, and so on don't appear in
+ pg_am. Similarly,
+ amsupport is just the number of support
+ routines required by the access method. The actual routines are
+ listed elsewhere.
- By the way, the <filename>amorderstrategy entry tells whether
+ By the way, the <structfield>amorderstrategy column tells whether
the access method supports ordered scan. Zero means it doesn't; if it
- does, <filename>amorderstrategy> is the number of the strategy
+ does, <structfield>amorderstrategy> is the number of the strategy
routine that corresponds to the ordering operator. For example, B-tree
- has <filename>amorderstrategy = 1 which is its
+ has <structfield>amorderstrategy = 1, which is its
less than
strategy number.
+
+
+
+
Operator Classes
- The next table of interest is <filename>pg_opclassname>. This table
+ The next table of interest is <classname>pg_opclassname>. This table
defines operator class names and input data types for each of the operator
classes supported by a given index access method. The same class name
can be used for several different access methods (for example, both B-tree
and hash access methods have operator classes named
- <filename>oid_ops>), but a separate
+ <literal>oid_ops>), but a separate
pg_opclass row must appear for each access method.
- The oid of the pg_opclassname> row is
+ The OID of the pg_opclassname> row is
used as a foreign
key in other tables to associate specific operators and support routines
with the operator class.
- You need to add a row with your opclass name (for example,
- <filename>complex_abs_ops>) to
- <filename>pg_opclassname>:
+ You need to add a row with your operator class name (for example,
+ <literal>complex_abs_ops>) to
+ <classname>pg_opclassname>:
INSERT INTO pg_opclass (opcamid, opcname, opcintype, opcdefault, opckeytype)
VALUES (
(SELECT oid FROM pg_am WHERE amname = 'btree'),
--------+---------+-----------------+-----------+------------+------------
277975 | 403 | complex_abs_ops | 277946 | t | 0
(1 row)
-
+
- Note that the oid for your pg_opclassname> row will
+ Note that the OID for your pg_opclassname> row will
be different! Don't worry about this though. We'll get this number
- from the system later just like we got the oid of the type here.
+ from the system later just like we got the OID of the type here.
- The above example assumes that you want to make this new opclass the
- default B-tree opclass for the complexe> data type.
- If you don't, just set <filename>opcdefault> to false instead.
- <filename>opckeytype> is not described here; it should always
- be zero for B-tree opclasses.
+ The above example assumes that you want to make this new operator class the
+ default B-tree operator class for the complexe> data type.
+ If you don't, just set <structfield>opcdefault> to false instead.
+ <structfield>opckeytype> is not described here; it should always
+ be zero for B-tree operator classes.
+
+
+
+
Creating the Operators and Support Routines
So now we have an access method and an operator class.
We still need a set of operators. The procedure for
- defining operators was discussed earlier in this manual.
- For the <filename>complex_abs_ops> operator class on B-trees,
+ defining operators was discussed in .
+ For the <literal>complex_abs_ops> operator class on B-trees,
the operators we require are:
- <programlisting>
- absolute value less-than
- absolute value less-than-or-equal
- absolute value equal
- absolute value greater-than-or-equal
- absolute value greater-than
- programlisting>
+ <itemizedlist spacing="compact">
+ absolute-value less-than (strategy 1)>>
+ absolute-value less-than-or-equal (strategy 2)>>
+ absolute-value equal (strategy 3)>>
+ absolute-value greater-than-or-equal (strategy 4)>>
+ absolute-value greater-than (strategy 5)>>
+ itemizedlist>
Suppose the code that implements these functions
is stored in the file
- <replaceable>PGROOT/tutorial/complex.c,
+ <filename>PGROOT/src/tutorial/complex.c,
which we have compiled into
- <replaceable>PGROOT/tutorial/complex.so.
-
+ <filename>PGROOT/src/tutorial/complex.so.
+ Part of the C code looks like this:
- Part of the C code looks like this: (note that we will only show the
- equality operator for the rest of the examples. The other four
- operators are very similar. Refer to complex.c
- or complex.source for the details.)
-
#define Mag(c) ((c)->x*(c)->x + (c)->y*(c)->y)
bool
double amag = Mag(a), bmag = Mag(b);
return (amag==bmag);
}
-
+
+ (Note that we will only show the equality operator for the rest of
+ the examples. The other four operators are very similar. Refer to
+ complex.c or
+ complex.source for the details.)
We make the function known to
PostgreSQL like this:
-CREATE FUNCTION complex_abs_eq(complex, complex)
- RETURNS bool
- AS 'PGROOT/tutorial/complex'
- LANGUAGE C;
-
+CREATE FUNCTION complex_abs_eq(complex, complex) RETURNS boolean
+ AS 'PGROOT/src/tutorial/complex'
+ LANGUAGE C;
+
- There are some important things that are happening here.
-
+ There are some important things that are happening here:
+
+
First, note that operators for less-than, less-than-or-equal, equal,
greater-than-or-equal, and greater-than for complex
we don't have any other operator = for complex,
but if we were building a practical data type we'd probably want = to
be the ordinary equality operation for complex numbers. In that case,
- we'd need to use some other operator name for complex_abs_eq.
+ we'd need to use some other operator name for complex_abs_eq>.
+
+
Second, although
PostgreSQL can cope with operators having
the same name as long as they have different input data types, C can only
Usually it's a good practice to include the data type name in the C
function name, so as not to conflict with functions for other data types.
+
+
Third, we could have made the
PostgreSQL name of the function
abs_eq, relying on
PostgreSQL to distinguish it
To keep the example simple, we make the function have the same names
at the C level and
PostgreSQL level.
+
+
Finally, note that these operator functions return Boolean values.
- In practice, all operators defined as index access method strategies
- must return Boolean, since they must appear at the top level of a WHERE
- clause to be used with an index.
- (On the other
- hand, the support function returns whatever the particular access method
- expects -- in this case, a signed integer.)
+ In practice, all operators defined as index access method
+ strategies must return type boolean, since they must
+ appear at the top level of a WHERE> clause to be used with an index.
+ (On the other hand, the support function returns whatever the
+ particular access method expects -- in this case, a signed
+ integer.)
+
+
+
- The final routine in the
- file is the support routine
mentioned when we discussed the amsupport
- column of the pg_am table. We will use this
- later on. For now, ignore it.
+ The final routine in the file is the support routine
+ mentioned when we discussed the amsupport> column of the
+ pg_am table. We will use this later on. For
+ now, ignore it.
Now we are ready to define the operators:
CREATE OPERATOR = (
leftarg = complex, rightarg = complex,
procedure = complex_abs_eq,
restrict = eqsel, join = eqjoinsel
);
-
+
The important
- things here are the procedure names (which are the
C
+ things here are the procedure names (which are the C
functions defined above) and the restriction and join selectivity
functions. You should just use the selectivity functions used in
the example (see complex.source).
Note that there
are different such functions for the less-than, equal, and greater-than
- cases. These must be supplied, or the optimizer will be unable to
+ cases. These must be supplied or the optimizer will be unable to
make effective use of the index.
The next step is to add entries for these operators to
- the <filename>pg_amopname> relation. To do this,
- we'll need the oids of the operators we just
+ the <classname>pg_amopname> relation. To do this,
+ we'll need the OIDs of the operators we just
defined. We'll look up the names of all the operators that take
- two complexes, and pick ours out:
+ two operands of type complex, and pick ours out:
- SELECT o.oid AS opoid, o.oprname
- INTO TEMP TABLE complex_ops_tmp
- FROM pg_operator o, pg_type t
- WHERE o.oprleft = t.oid and o.oprright = t.oid
+>
+SELECT o.oid AS opoid, o.oprname
+ INTO TEMP TABLE complex_ops_tmp
+ FROM pg_operator o, pg_type t
+ WHERE o.oprleft = t.oid and o.oprright = t.oid
and t.typname = 'complex';
opoid | oprname
277973 | >=
277974 | >
(6 rows)
- >
+>
- (Again, some of your oid numbers will almost
+ (Again, some of your OID numbers will almost
certainly be different.) The operators we are interested in are those
- with oids 277970 through 277974. The values you
+ with OIDs 277970 through 277974. The values you
get will probably be different, and you should substitute them for the
values below. We will do this with a select statement.
- Now we are ready to insert entries into <filename>pg_amopname> for
+ Now we are ready to insert entries into <classname>pg_amopname> for
our new operator class. These entries must associate the correct
B-tree strategy numbers with each of the operators we need.
The command to insert the less-than operator looks like:
- INSERT INTO pg_amop (amopclaid, amopstrategy, amopreqcheck, amopopr)
- SELECT opcl.oid, 1, false, c.opoid
+INSERT INTO pg_amop (amopclaid, amopstrategy, amopreqcheck, amopopr)
+ SELECT opcl.oid, 1, false, c.opoid
FROM pg_opclass opcl, complex_ops_tmp c
WHERE
opcamid = (SELECT oid FROM pg_am WHERE amname = 'btree') AND
opcname = 'complex_abs_ops' AND
c.oprname = '<';
-
+
Now do this for the other operators substituting for the 1> in the
second line above and the <> in the last line. Note the order:
- The final step is registration of the support routine
previously
- described in our discussion of pg_am. The
- oid of this support routine is stored in the
- pg_amproc table, keyed by the operator class
- oid and the support routine number.
+ The final step is the registration of the support routine
previously
+ described in our discussion of pg_am. The
+ OID of this support routine is stored in the
+ pg_amproc table, keyed by the operator class
+ OID and the support routine number.
+
+
First, we need to register the function in
PostgreSQL (recall that we put the
-
C code that implements this routine in the bottom of
+ C code that implements this routine in the bottom of
the file in which we implemented the operator routines):
- CREATE FUNCTION complex_abs_cmp(complex, complex)
- RETURNS int4
- AS 'PGROOT/tutorial/complex'
- LANGUAGE C;
+CREATE FUNCTION complex_abs_cmp(complex, complex)
+ RETURNS integer
+ AS 'PGROOT/src/tutorial/complex'
+ LANGUAGE C;
- SELECT oid, proname FROM pg_proc
- WHERE proname = 'complex_abs_cmp';
+SELECT oid, proname FROM pg_proc
+ WHERE proname = 'complex_abs_cmp';
oid | proname
--------+-----------------
277997 | complex_abs_cmp
(1 row)
-
+
+
+ (Again, your OID number will probably be different.)
+
- (Again, your oid number will probably be different.)
We can add the new row as follows:
- INSERT INTO pg_amproc (amopclaid, amprocnum, amproc)
- SELECT opcl.oid, 1, p.oid
+INSERT INTO pg_amproc (amopclaid, amprocnum, amproc)
+ SELECT opcl.oid, 1, p.oid
FROM pg_opclass opcl, pg_proc p
WHERE
opcamid = (SELECT oid FROM pg_am WHERE amname = 'btree') AND
opcname = 'complex_abs_ops' AND
p.proname = 'complex_abs_cmp';
-
+
And we're done! (Whew.) It should now be possible to create
- and use B-tree indexes on <filename>complexe> columns.
+ and use B-tree indexes on <type>complexe> columns.
+
-
+
+
+
Introduction
+
PostgreSQL supports left unary,
- right unary and binary
+ right unary, and binary
operators. Operators can be overloaded; that is,
the same operator name can be used for different operators
- that have different numbers and types of arguments. If
+ that have different numbers and types of operands. If
there is an ambiguous situation and the system cannot
determine the correct operator to use, it will return
- an error. You may have to typecast the left and/or
+ an error. You may have to type-cast the left and/or
right operands to help it understand which operator you
meant to use.
Every operator is syntactic sugar
for a call to an
underlying function that does the real work; so you must
first create the underlying function before you can create
- the operator. However, an operator is not
- merely syntactic sugar, because it carries additional information
+ the operator. However, an operator is not merely
+ syntactic sugar, because it carries additional information
that helps the query planner optimize queries that use the
operator. Much of this chapter will be devoted to explaining
that additional information.
+
+
+
+
Example
- Here is an example of creating an operator for adding two
- complex numbers. We assume we've already created the definition
- of type complex. First we need a function that does the work;
- then we can define the operator:
+ Here is an example of creating an operator for adding two complex
+ numbers. We assume we've already created the definition of type
+ complex (see ). First we need a
+ function that does the work, then we can define the operator:
CREATE FUNCTION complex_add(complex, complex)
RETURNS complex
AS 'PGROOT/tutorial/complex'
procedure = complex_add,
commutator = +
);
-
+
Now we can do:
+>
SELECT (a + b) AS c FROM test_complex;
-+----------------+
-|c |
-+----------------+
-|(5.2,6.05) |
-+----------------+
-|(133.42,144.95) |
-+----------------+
-
+ c
+-----------------
+ (5.2,6.05)
+ (133.42,144.95)
+
- We've shown how to create a binary operator here. To
- create unary operators, just omit one of leftarg (for
- left unary) or rightarg (for right unary). The procedure
- clause and the argument clauses are the only required items
- in CREATE OPERATOR. The COMMUTATOR clause shown in the example
- is an optional hint to the query optimizer. Further details about
- COMMUTATOR and other optimizer hints appear below.
+ We've shown how to create a binary operator here. To create unary
+ operators, just omit one of leftarg> (for left unary) or
+ rightarg> (for right unary). The procedure>
+ clause and the argument clauses are the only required items in
+ CREATE OPERATOR. The commutator>
+ clause shown in the example is an optional hint to the query
+ optimizer. Further details about commutator> and other
+ optimizer hints appear below.
+
Operator Optimization Information
Additional optimization clauses might be added in future versions of
PostgreSQL. The ones described here are all
- the ones that release 6.5 understands.
+ the ones that release &version; understands.
COMMUTATOR
- The COMMUTATOR clause, if provided, names an operator that is the
+ The COMMUTATOR> clause, if provided, names an operator that is the
commutator of the operator being defined. We say that operator A is the
commutator of operator B if (x A y) equals (y B x) for all possible input
- values x,y. Notice that B is also the commutator of A. For example,
+ values x, y. Notice that B is also the commutator of A. For example,
operators <> and >> for a particular data type are usually each others'
commutators, and operator +> is usually commutative with itself.
But operator -> is usually not commutative with anything.
- The left argument type of a commuted operator is the same as the
- right argument type of its commutator, and vice versa. So the name of
+ The left operand type of a commuted operator is the same as the
+ right operand type of its commutator, and vice versa. So the name of
the commutator operator is all that
PostgreSQL
- needs to be given to look up the commutator, and that's all that need
- be provided in the COMMUTATOR clause.
+ needs to be given to look up the commutator, and that's all that needs to
+ be provided in the COMMUTATOR> clause.
- One way is to omit the COMMUTATOR clause in the first operator that
+ One way is to omit the COMMUTATOR> clause in the first operator that
you define, and then provide one in the second operator's definition.
Since
PostgreSQL knows that commutative
operators come in pairs, when it sees the second definition it will
- automatically go back and fill in the missing COMMUTATOR clause in
+ automatically go back and fill in the missing COMMUTATOR> clause in
the first definition.
- The other, more straightforward way is just to include COMMUTATOR clauses
+ The other, more straightforward way is just to include COMMUTATOR> clauses
in both definitions. When
PostgreSQL processes
- the first definition and realizes that COMMUTATOR refers to a non-existent
+ the first definition and realizes that COMMUTATOR> refers to a non-existent
operator, the system will make a dummy entry for that operator in the
- system's pg_operator table. This dummy entry will have valid data only
- for the operator name, left and right argument types, and result type,
+ system catalog. This dummy entry will have valid data only
+ for the operator name, left and right operand types, and result type,
since that's all that
PostgreSQL can deduce
at this point. The first operator's catalog entry will link to this
dummy entry. Later, when you define the second operator, the system
updates the dummy entry with the additional information from the second
definition. If you try to use the dummy operator before it's been filled
- in, you'll just get an error message. (Note: this procedure did not work
+ in, you'll just get an error message. (Note: This procedure did not work
reliably in
PostgreSQL versions before 6.5,
but it is now the recommended way to do things.)
NEGATOR
- The NEGATOR clause, if provided, names an operator that is the
+ The NEGATOR> clause, if provided, names an operator that is the
negator of the operator being defined. We say that operator A
- is the negator of operator B if both return boolean results and
- (x A y) equals NOT (x B y) for all possible inputs x,y.
+ is the negator of operator B if both return Boolean results and
+ (x A y) equals NOT (x B y) for all possible inputs x, y.
Notice that B is also the negator of A.
For example, <> and >=> are a negator pair for most data types.
- An operator can never be validly be its own negator.
+ An operator can never validly be its own negator.
- Unlike COMMUTATOR, a pair of unary operators could validly be marked
+ Unlike commutators, a pair of unary operators could validly be marked
as each others' negators; that would mean (A x) equals NOT (B x)
- for all x, or the equivalent for right-unary operators.
+ for all x, or the equivalent for right unary operators.
- An operator's negator must have the same left and/or right argument types
- as the operator itself, so just as with COMMUTATOR, only the operator
- name need be given in the NEGATOR clause.
+ An operator's negator must have the same left and/or right operand types
+ as the operator itself, so just as with COMMUTATOR>, only the operator
+ name need be given in the NEGATOR> clause.
- Providing NEGATOR is very helpful to the query optimizer since
+ Providing a negator is very helpful to the query optimizer since
it allows expressions like NOT (x = y) to be simplified into
x <> y. This comes up more often than you might think, because
NOTs can be inserted as a consequence of other rearrangements.
RESTRICT
- The RESTRICT clause, if provided, names a restriction selectivity
+ The RESTRICT> clause, if provided, names a restriction selectivity
estimation function for the operator (note that this is a function
- name, not an operator name). RESTRICT clauses only make sense for
- binary operators that return boolean. The idea behind a restriction
+ name, not an operator name). RESTRICT> clauses only make sense for
+ binary operators that return boolean>. The idea behind a restriction
selectivity estimator is to guess what fraction of the rows in a
- table will satisfy a WHERE-clause condition of the form
- field OP constant
-
+ table will satisfy a WHERE-clause condition of the form
+column OP constant
+
for the current operator and a particular constant value.
This assists the optimizer by
- giving it some idea of how many rows will be eliminated by WHERE
+ giving it some idea of how many rows will be eliminated by WHERE>
clauses that have this form. (What happens if the constant is on
the left, you may be wondering? Well, that's one of the things that
- COMMUTATOR is for...)
+ COMMUTATOR> is for...)
the scope of this chapter, but fortunately you can usually just use
one of the system's standard estimators for many of your own operators.
These are the standard restriction estimators:
- eqsel for =
- neqsel for <>
- scalarltsel for < or <=
- scalargtsel for > or >=
- ProgramListing>
+ >
+ eqsel> for =>
+ neqsel> for <>>
+ scalarltsel> for <> or <=>
+ scalargtsel> for >> or >=>
+ simplelist>
It might seem a little odd that these are the categories, but they
make sense if you think about it. => will typically accept only
a small fraction of the rows in a table; <>> will typically reject
- You can frequently get away with using either eqsel or neqsel for
+ You can frequently get away with using either eqsel or neqsel for
operators that have very high or very low selectivity, even if they
aren't really equality or inequality. For example, the
- approximate-equality geometric operators use eqsel on the assumption that
+ approximate-equality geometric operators use eqsel on the assumption that
they'll usually only match a small fraction of the entries in a table.
- You can use scalarltsel and scalargtsel for comparisons on data types that
+ You can use scalarltsel> and scalargtsel> for comparisons on data types that
have some sensible means of being converted into numeric scalars for
range comparisons. If possible, add the data type to those understood
- by the routine convert_to_scalar() in src/backend/utils/adt/selfuncs.c.
+ by the routine convert_to_scalar() in src/backend/utils/adt/selfuncs.c.
(Eventually, this routine should be replaced by per-data-type functions
- identified through a column of the pg_type table; but that hasn't happened
+ identified through a column of the pg_type> system catalog; but that hasn't happened
yet.) If you do not do this, things will still work, but the optimizer's
estimates won't be as good as they could be.
There are additional selectivity functions designed for geometric
- operators in src/backend/utils/adt/geo_selfuncs.c: areasel, positionsel,
- and contsel. At this writing these are just stubs, but you may want
+ operators in src/backend/utils/adt/geo_selfuncs.c: areasel, positionsel,
+ and contsel. At this writing these are just stubs, but you may want
to use them (or even better, improve them) anyway.
JOIN
- The JOIN clause, if provided, names a join selectivity
+ The JOIN> clause, if provided, names a join selectivity
estimation function for the operator (note that this is a function
- name, not an operator name). JOIN clauses only make sense for
- binary operators that return boolean. The idea behind a join
+ name, not an operator name). JOIN> clauses only make sense for
+ binary operators that return boolean. The idea behind a join
selectivity estimator is to guess what fraction of the rows in a
- pair of tables will satisfy a WHERE-clause condition of the form
- table1.field1 OP table2.field2
-
- for the current operator. As with the RESTRICT clause, this helps
+ pair of tables will satisfy a WHERE>-clause condition of the form
+table1.column1 OP table2.column2
+
+ for the current operator. As with the RESTRICT clause, this helps
the optimizer very substantially by letting it figure out which
of several possible join sequences is likely to take the least work.
As before, this chapter will make no attempt to explain how to write
a join selectivity estimator function, but will just suggest that
you use one of the standard estimators if one is applicable:
- <ProgramListing>
- eqjoinsel for =
- neqjoinsel for <>
- scalarltjoinsel for < or <=
- scalargtjoinsel for > or >=
- areajoinsel for 2D area-based comparisons
- positionjoinsel for 2D position-based comparisons
- contjoinsel for 2D containment-based comparisons
- >
+ <simplelist>
+ eqjoinsel> for =>
+ neqjoinsel> for <>>
+ scalarltjoinsel> for <> or <=>
+ scalargtjoinsel> for >> or >=>
+ areajoinsel> for 2D area-based comparisons
+ positionjoinsel> for 2D position-based comparisons
+ contjoinsel> for 2D containment-based comparisons
+ >
HASHES
- The HASHES clause, if present, tells the system that it is OK to
- use the hash join method for a join based on this operator. HASHES
- only makes sense for binary operators that return boolean, and
+ The HASHES clause, if present, tells the system that it is OK to
+ use the hash join method for a join based on this operator. HASHES>
+ only makes sense for binary operators that return boolean>, and
in practice the operator had better be equality for some data type.
The assumption underlying hash join is that the join operator can
- only return TRUE for pairs of left and right values that hash to the
+ only return true for pairs of left and right values that hash to the
same hash code. If two values get put in different hash buckets, the
join will never compare them at all, implicitly assuming that the
- result of the join operator must be FALSE. So it never makes sense
- to specify HASHES for operators that do not represent equality.
+ result of the join operator must be false. So it never makes sense
+ to specify HASHES for operators that do not represent equality.
There are also machine-dependent ways in which a hash join might fail
to do the right thing. For example, if your data type
is a structure in which there may be uninteresting pad bits, it's unsafe
- to mark the equality operator HASHES. (Unless, perhaps, you write
+ to mark the equality operator HASHES>. (Unless, perhaps, you write
your other operators to ensure that the unused bits are always zero.)
- Another example is that the FLOAT data types are unsafe for hash
- joins. On machines that meet the
IEEE> floating point standard, minus
+ Another example is that the floating-point data types are unsafe for hash
+ joins. On machines that meet the
IEEE> floating-point standard, minus
zero and plus zero are different values (different bit patterns) but
- they are defined to compare equal. So, if float equality were marked
- HASHES, a minus zero and a plus zero would probably not be matched up
+ they are defined to compare equal. So, if the equality operator on floating-point data types were marked
+ HASHES>, a minus zero and a plus zero would probably not be matched up
by a hash join, but they would be matched up by any other join process.
- The bottom line is that you should probably only use HASHES for
+ The bottom line is that you should probably only use HASHES for
equality operators that are (or could be) implemented by memcmp().
SORT1 and SORT2
- The SORT clauses, if present, tell the system that it is permissible to use
+ The SORT clauses, if present, tell the system that it is permissible to use
the merge join method for a join based on the current operator.
Both must be specified if either is. The current operator must be
- equality for some pair of data types, and the SORT1 and SORT2 clauses
- name the ordering operator ('<' operator) for the left and right-side
+ equality for some pair of data types, and the SORT1> and SORT2> clauses
+ name the ordering operator (<
operator) for the left and right-side
data types respectively.
be capable of being fully ordered, and the join operator must be one
that can only succeed for pairs of values that fall at the same place>
in the sort order. In practice this means that the join operator must
- behave like equality. But unlike hashjoin, where the left and right
+ behave like equality. But unlike hash join, where the left and right
data types had better be the same (or at least bitwise equivalent),
- it is possible to mergejoin two
+ it is possible to merge-join two
distinct data types so long as they are logically compatible. For
- example, the int2-versus-int4 equality operator is mergejoinable.
+ example, the int2-versus-int4 equality operator is merge-joinable.
We only need sorting operators that will bring both data types into a
logically compatible sequence.
- When specifying merge sort operators, the current operator and both
- referenced operators must return boolean; the SORT1 operator must have
- both input data types equal to the current operator's left argument type,
- and the SORT2 operator must have
- both input data types equal to the current operator's right argument type.
- (As with COMMUTATOR and NEGATOR, this means that the operator name is
+ When specifying merge-sort operators, the current operator and both
+ referenced operators must return boolean; the SORT1> operator must have
+ both input data types equal to the current operator's left operand type,
+ and the SORT2> operator must have
+ both input data types equal to the current operator's right operand type.
+ (As with COMMUTATOR> and NEGATOR>, this means that the operator name is
sufficient to specify the operator, and the system is able to make dummy
operator entries if you happen to define the equality operator before
the other ones.)
- In practice you should only write SORT clauses for an => operator,
+ In practice you should only write SORT> clauses for an => operator,
and the two referenced operators should always be named <>. Trying
to use merge join with operators named anything else will result in
hopeless confusion, for reasons we'll see in a moment.
There are additional restrictions on operators that you mark
- mergejoinable. These restrictions are not currently checked by
- CREATE OPERATOR, but a merge join may fail at runtime if any are
+ merge-joinable. These restrictions are not currently checked by
+ CREATE OPERATOR, but a merge join may fail at run time if any are
not true:
- The mergejoinable equality operator must have a commutator
+ The merge-joinable equality operator must have a commutator
(itself if the two data types are the same, or a related equality operator
if they are different).
There must be <> and >> ordering operators having the same left and
- right input data types as the mergejoinable operator itself. These
+ right operand data types as the merge-joinable operator itself. These
operators must be named <> and >>; you do
not have any choice in the matter, since there is no provision for
specifying them explicitly. Note that if the left and right data types
are different, neither of these operators is the same as either
- SORT operator. But they had better order the data values compatibly
- with the SORT operators, or mergejoin will fail to work.
+ SORT operator. But they had better order the data values compatibly
+ with the SORT operators, or the merge join will fail to work.
Procedural Languages
+
+
Introduction
+
PostgreSQL allows users to add new
programming languages to be available for writing functions and
- Writing a handler for a new procedural language is outside the
- scope of this manual, although some information is provided in
- the CREATE LANGUAGE reference page. Several procedural languages are
-
available in the standard PostgreSQL distribution.
+ Writing a handler for a new procedural language is described in
+ . Several procedural languages are
+ available in the standard
PostgreSQL
+ distribution, which can serve as examples.
+
Installing Procedural Languages
-
+ id="xplang-install-cr1">
The handler must be declared with the command
-
+ id="xplang-install-cr2">
The PL must be declared with the command
CREATE TRUSTED PROCEDURAL LANGUAGE language-name
HANDLER handler_function_name;
- The optional key word TRUSTED tells
- whether ordinary database users that have no superuser
- privileges should be allowed to use this language to create functions
- and trigger procedures. Since PL functions are
- executed inside the database backend, the
TRUSTED
- flag should only be given for
- languages that do not allow access to database backends
- internals or the file system. The languages
PL/pgSQL,
-
PL/Tcl, and
PL/Perl are known to be trusted; the language
PL/TclU
- should not be marked trusted.
+ The optional key word TRUSTED tells whether
+ ordinary database users that have no superuser privileges should
+ be allowed to use this language to create functions and trigger
+ procedures. Since PL functions are executed inside the database
+ server, the TRUSTED flag should only be given
+ for languages that do not allow access to database server
+ internals or the file system. The languages
+
PL/Python are known to be trusted;
+ the languages
PL/TclU and
+
PL/PerlU are designed to provide
+ unlimited functionality should not be
+ marked trusted.
- In a default
PostgreSQL installation, the
- handler for the
PL/pgSQL language is built and installed into the
- library
directory. If Tcl/Tk support is configured
- in, the handlers for PL/Tcl and PL/TclU are also built and installed in
- the same location. Likewise, the PL/Perl handler is built and installed
- if Perl support is configured. The createlang
- script automates the two CREATE steps described above.
+ In a default
PostgreSQL installation,
+ the handler for the
PL/pgSQL language
+ is built and installed into the library
+ directory. If Tcl/Tk support is configured in, the handlers for
+ PL/Tcl and PL/TclU are also built and installed in the same
+ location. Likewise, the PL/Perl and PL/PerlU handlers are built
+ and installed if Perl support is configured, and PL/Python is
+ installed if Python support is configured. The
+
createlang script automates
+ linkend="xplang-install-cr1"> and
+ linkend="xplang-install-cr2"> described above.
- <procedure>
-
Example
+ <example>
+
Manual Installation of PL/pgSQL
-
- The following command tells the database where to find the
+ The following command tells the database server where to find the
shared object for the
PL/pgSQL language's call handler function.
'$libdir/plpgsql' LANGUAGE C;
-
-
The command
should be invoked for functions and trigger procedures where the
language attribute is plpgsql.
-
-
+
extending
+
+ This chapter needs to be updated for the version-1 function manager
+ interface.
+
+
- As previously mentioned, there are two kinds of types
- in
PostgreSQL: base types (defined in a programming language)
- and composite types.
- Examples in this section up to interfacing indexes can
- be found in complex.sql and complex.c. Composite examples
- are in funcs.sql.
+ As previously mentioned, there are two kinds of types in
+
PostgreSQL: base types (defined in a
+ programming language) and composite types. This chapter describes
+ how to define new base types.
-
-
User-Defined Types
+ The examples in this section can be found in
+ complex.sql and complex.c
+ in the tutorial directory. Composite examples are in
+ funcs.sql.
+
-
-
Functions Needed for a User-Defined Type
- A user-defined type must always have input and output
- functions. These functions determine how the type
- appears in strings (for input by the user and output to
- the user) and how the type is organized in memory. The
- input function takes a null-delimited character string
- as its input and returns the internal (in memory)
- representation of the type. The output function takes the
- internal representation of the type and returns a null
- delimited character string.
- Suppose we want to define a complex type which represents
- complex numbers. Naturally, we choose to represent a
- complex in memory as the following
C structure:
-
+
+
+
+
+ A user-defined type must always have input and output functions.
+ These functions determine how the type appears in strings (for input
+ by the user and output to the user) and how the type is organized in
+ memory. The input function takes a null-terminated character string
+ as its input and returns the internal (in memory) representation of
+ the type. The output function takes the internal representation of
+ the type and returns a null-terminated character string.
+
+
+ Suppose we want to define a complex type which represents complex
+ numbers. Naturally, we would choose to represent a complex in memory
+ as the following
C structure:
+
typedef struct Complex {
double x;
double y;
} Complex;
-
-
- and a string of the form (x,y) as the external string
- representation.
- These functions are usually not hard to write, especially
- the output function. However, there are a number of points
- to remember:
-
-
-
-
When defining your external (string) representation,
- remember that you must eventually write a
- complete and robust parser for that representation
- as your input function!
-
+
+
+ and a string of the form (x,y) as the external string
+ representation.
+
+
+ The functions are usually not hard to write, especially the output
+ function. However, there are a number of points to remember:
+
+
+
+ When defining your external (string) representation, remember
+ that you must eventually write a complete and robust parser for
+ that representation as your input function!
+
+
+ For instance:
+
Complex *
complex_in(char *str)
{
result->y = y;
return (result);
}
-
+
+
- The output function can simply be:
+ The output function can simply be:
char *
complex_out(Complex *complex)
{
sprintf(result, "(%g,%g)", complex->x, complex->y);
return(result);
}
-
-
-
-
-
- You should try to make the input and output
- functions inverses of each other. If you do
- not, you will have severe problems when you need
- to dump your data into a file and then read it
- back in (say, into someone else's database on
- another computer). This is a particularly common
- problem when floating-point numbers are
- involved.
-
-
-
+
+
+
+
+
- To define the
complex type, we need to create the two
- user-defined functions complex_in and complex_out
- before creating the type:
+ You should try to make the input and output functions inverses of
+ each other. If you do not, you will have severe problems when
+ you need to dump your data into a file and then read it back in
+ (say, into someone else's database on another computer). This is
+ a particularly common problem when floating-point numbers are
+ involved.
+
+
+
+
+ To define the complex type, we need to create the two
+ user-defined functions complex_in and
+ complex_out before creating the type:
+
CREATE FUNCTION complex_in(opaque)
RETURNS complex
AS 'PGROOT/tutorial/complex'
RETURNS opaque
AS 'PGROOT/tutorial/complex'
LANGUAGE C;
+
+
+ Finally, we can declare the data type:
CREATE TYPE complex (
internallength = 16,
input = complex_in,
output = complex_out
);
-
-
-
- As discussed earlier,
PostgreSQL fully supports arrays of
- base types. Additionally,
PostgreSQL supports arrays of
- user-defined types as well. When you define a type,
-
PostgreSQL automatically provides support for arrays of
- that type. For historical reasons, the array type has
- the same name as the user-defined type with the
- underscore character _ prepended.
- Composite types do not need any function defined on
- them, since the system already understands what they
- look like inside.
-
-
+
+
-
-
Large Objects
-
- If the values of your datatype might exceed a few hundred bytes in
- size (in internal form), you should be careful to mark them TOASTable.
- To do this, the internal representation must follow the standard
- layout for variable-length data: the first four bytes must be an int32
- containing the total length in bytes of the datum (including itself).
- Then, all your functions that accept values of the type must be careful
- to call pg_detoast_datum() on the supplied values --- after checking
- that the value is not NULL, if your function is not strict. Finally,
- select the appropriate storage option when giving the CREATE TYPE
- command.
-
-
-
-
+
+
+ As discussed earlier,
PostgreSQL fully
+ supports arrays of base types. Additionally,
+
PostgreSQL supports arrays of
+ user-defined types as well. When you define a type,
+
PostgreSQL automatically provides support
+ for arrays of that type. For historical reasons, the array type has
+ the same name as the user-defined type with the underscore character
+ _> prepended.
+
+
+ Composite types do not need any function defined on them, since the
+ system already understands what they look like inside.
+
+
+
+ and user-defined types
+
+ If the values of your datatype might exceed a few hundred bytes in
+ size (in internal form), you should be careful to mark them
+ TOAST-able. To do this, the internal representation must follow the
+ standard layout for variable-length data: the first four bytes must
+ be an int32 containing the total length in bytes of the
+ datum (including itself). Then, all your functions that accept
+ values of the type must be careful to call
+ pg_detoast_datum() on the supplied values ---
+ after checking that the value is not NULL, if your function is not
+ strict. Finally, select the appropriate storage option when giving
+ the CREATE TYPE command.
+
+