+ linkend="xfunc-plhandler">). Version-1 code is also more
+ portable than version-0, because it does not break restrictions
+ on function call protocol in the C standard. For more details
+ see src/backend/utils/fmgr/README in the
+ source distribution.
+
+
+
+
+
Writing Code
+
+ Before we turn to the more advanced topics, we should discuss
+ some coding rules for PostgreSQL C-language functions. While it
+ may be possible to load functions written in languages other than
+ C into
PostgreSQL , this is usually
+ difficult (when it is possible at all) because other languages,
+ such as C++, FORTRAN, or Pascal often do not follow the same
+ calling convention as C. That is, other languages do not pass
+ argument and return values between functions in the same way.
+ For this reason, we will assume that your C-language functions
+ are actually written in C.
+
+
+ The basic rules for writing and building C functions are as follows:
+
+
+
+ Use pg_config
+ --includedir-server
pg_config>>
+ to find out where the
PostgreSQL> server header
+ files are installed on your system (or the system that your
+ users will be running on). This option is new with
+
PostgreSQL> 7.1 you should use the option
+ --includedir . (pg_config
+ will exit with a non-zero status if it encounters an unknown
+ option.) For releases prior to 7.1 you will have to guess,
+ but since that was before the current calling conventions were
+ introduced, it is unlikely that you want to support those
+ releases.
+
+
+
+
+ When allocating memory, use the
+ palloc and pfree
+ instead of the corresponding C library functions
+ malloc and free .
+ The memory allocated by palloc will be
+ freed automatically at the end of each transaction, preventing
+ memory leaks.
+
+
+
+
+ Always zero the bytes of your structures using
+ memset or bzero .
+ Several routines (such as the hash access method, hash joins,
+ and the sort algorithm) compute functions of the raw bits
+ contained in your structure. Even if you initialize all
+ fields of your structure, there may be several bytes of
+ alignment padding (holes in the structure) that may contain
+ garbage values.
+
+
+
+
+ Most of the internal
PostgreSQL
+ types are declared in postgres.h , while
+ the function manager interfaces
+ (PG_FUNCTION_ARGS , etc.) are in
+ fmgr.h , so you will need to include at
+ least these two files. For portability reasons it's best to
+ include postgres.h first>,
+ before any other system or user header files. Including
+ postgres.h will also include
+ elog.h and palloc.h
+ for you.
+
+
+
+
+ Symbol names defined within object files must not conflict
+ with each other or with symbols defined in the
+
PostgreSQL server executable. You
+ will have to rename your functions or variables if you get
+ error messages to this effect.
+
+
+
+
+ Compiling and linking your code so that it can be dynamically
+ loaded into
PostgreSQL always
+ requires special flags. See for a
+ detailed explanation of how to do it for your particular
+ operating system.
+
+
+
+&dfunc;
+
-
Composite Type s in C-Language Functions
+
Composite-Type Argument s in C-Language Functions
Composite types do not have a fixed layout like C
part of an inheritance hierarchy may have different
fields than other members of the same inheritance hierarchy.
Therefore,
PostgreSQL provides
- a procedural interface for accessing fields of composite types
- from C. As
PostgreSQL processes
- a set of rows, each row will be passed into your
- function as an opaque structure of type TUPLE .
+ a function interface for accessing fields of composite types
+ from C.
+
+
Suppose we want to write a function to answer the query
SELECT name, c_overpaid(emp, 1500) AS overpaid
-FROM emp
-WHERE name = 'Bill' OR name = 'Sam';
+ FROM emp
+ WHERE name = 'Bill' OR name = 'Sam';
- In the query above, we can define c_overpaid> as:
+ Using call conventions version 0, we can define
+ c_overpaid> as:
#include "postgres.h"
#include "executor/executor.h" /* for GetAttributeByName() */
bool
-c_overpaid(TupleTableSlot *t, /* the current row of EMP */
+c_overpaid(TupleTableSlot *t, /* the current row of emp */
int32 limit)
{
bool isnull;
salary = DatumGetInt32(GetAttributeByName(t, "salary", &isnull));
if (isnull)
- return (false) ;
+ return false ;
return salary > limit;
}
+
+
+ In version-1 coding, the above would look like this:
-/* In version-1 coding, the above would look like this: */
+#include "postgres.h"
+#include "executor/executor.h" /* for GetAttributeByName() */
PG_FUNCTION_INFO_V1(c_overpaid);
salary = DatumGetInt32(GetAttributeByName(t, "salary", &isnull));
if (isnull)
PG_RETURN_BOOL(false);
- /* Alternatively, we might prefer to do PG_RETURN_NULL() for null salary */
+ /* Alternatively, we might prefer to do PG_RETURN_NULL() for null salary. */
PG_RETURN_BOOL(salary > limit);
}
GetAttributeByName is the
PostgreSQL system function that
- returns attributes out of the current row. It has
+ returns attributes out of the specified row. It has
three arguments: the argument of type TupleTableSlot* passed into
the function, the name of the desired attribute, and a
return parameter that tells whether the attribute
- The
following command lets PostgreSQL
- know about the c_overpaid function :
+ The following command declares the function
+ c_overpaid in SQL :
-CREATE FUNCTION c_overpaid(emp, int4 )
-RETURNS bool
-AS 'PGROOT /tutorial/funcs'
-LANGUAGE C;
+CREATE FUNCTION c_overpaid(emp, integer )
+ RETURNS boolean
+ AS 'DIRECTORY /funcs', 'c_overpaid'
+ LANGUAGE C;
-
Table Function API
-
- The Table Function API assists in the creation of user-defined
- C language table functions ().
- Table functions are functions that produce a set of rows, made up of
- either base (scalar) data types, or composite (multi-column) data types.
- The API is split into two main components: support for returning
- composite data types, and support for returning multiple rows
- (set-returning functions or
SRF>s).
-
+
Returning Rows (Composite Types) from C-Language Functions
- The Table Function API relies on macros and functions to suppress most
- of the complexity of building composite data types and returning multiple
- results. A table function must follow the version-1 calling convention
- described above. In addition , the source file must include:
+ To return a row or composite-type value from a C-language
+ function, you can use a special API that provides macros and
+ functions to hide most of the complexity of building composite
+ data types. To use this API , the source file must include:
#include "funcapi.h"
-
-
Returning Rows (Composite Types)
-
- The Table Function API support for returning composite data types
- (or rows) starts with the AttInMetadata>
- structure. This structure holds arrays of individual attribute
- information needed to create a row from raw C strings. It also
- saves a pointer to the TupleDesc>. The information
- carried here is derived from the TupleDesc>, but it
- is stored here to avoid redundant CPU cycles on each call to a
- table function. In the case of a function returning a set, the
- AttInMetadata> structure should be computed
- once during the first call and saved for re-use in later calls.
+ The support for returning composite data types (or rows) starts
+ with the AttInMetadata> structure. This structure
+ holds arrays of individual attribute information needed to create
+ a row from raw C strings. The information contained in the
+ structure is derived from a TupleDesc> structure,
+ but it is stored to avoid redundant computations on each call to
+ a set-returning function (see next section). In the case of a
+ function returning a set, the AttInMetadata>
+ structure should be computed once during the first call and saved
+ for reuse in later calls. AttInMetadata> also
+ saves a pointer to the original TupleDesc>.
typedef struct AttInMetadata
{
TupleDesc RelationNameGetTupleDesc(const char *relname)
- to get a TupleDesc> based on a specifi ed relation, or
+ to get a TupleDesc> for a nam ed relation, or
TupleDesc TypeGetTupleDesc(Oid typeoid, List *colaliases)
to get a TupleDesc> based on a type OID. This can
- be used to get a TupleDesc> for a base (scalar) or
- composite (relation) type. Then
+ be used to get a TupleDesc> for a base or
+ composite type. Then
AttInMetadata *TupleDescGetAttInMetadata(TupleDesc tupdesc)
initialized based on the given
TupleDesc>. AttInMetadata> can be
used in conjunction with C strings to produce a properly formed
- tuple. The metadata is stored here to avoid redundant work across
- multiple calls.
+ row value (internally called tuple).
to initialize this tuple slot, or obtain one through other (user provided)
means. The tuple slot is needed to create a Datum> for return by the
- function. The same slot can (and should) be re- used on each call.
+ function. The same slot can (and should) be reused on each call.
HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values)
can be used to build a HeapTuple> given user data
- in C string form. <quote>values> is an array of C strings, one for
- each attribute of the return tuple . Each C string should be in
+ in C string form. <literal>values> is an array of C strings, one for
+ each attribute of the return row . Each C string should be in
the form expected by the input function of the attribute data
type. In order to return a null value for one of the attributes,
the corresponding pointer in the
values> array
should be set to NULL>. This function will need to
- be called again for each tuple you return.
+ be called again for each row you return.
BuildTupleFromCStrings> is only convenient if your
function naturally computes the values to be returned as text
strings. If your code naturally computes the values as a set of
- Datum s, you should instead use the underlying
- heap_formtuple> routine to convert the
- Datum s directly into a tuple. You will still need
+ Datum> value s, you should instead use the underlying
+ function heap_formtuple> to convert the
+ Datum value s directly into a tuple. You will still need
the TupleDesc> and a TupleTableSlot>,
but not AttInMetadata>.
- Once you have built a tuple to return from your function, the tuple mus t
- be converted into a Datum>. Use
+ Once you have built a tuple to return from your function, i t
+ must be converted into a Datum>. Use
TupleGetDatum(TupleTableSlot *slot, HeapTuple tuple)
- An example appears below .
+ An example appears in the next section .
-
+
+
+
+
Returning Sets from C-Language Functions
-
-
Returning Sets
+ There is also a special API that provides support for returning
+ sets (multiple rows) from a C-language function. A set-returning
+ function must follow the version-1 calling conventions. Also,
+ source files must include funcapi.h , as
+ above.
+
- A set-returning function (
SRF>) is normally called
+ A set-returning function (
SRF>) is called
once for each item it returns. The
SRF> must
therefore save enough state to remember what it was doing and
- return the next item on each call. The Table Function API
- provides the FuncCallContext> structure to help
- control this process. fcinfo->flinfo->fn_extra>
+ return the next item on each call.
+ The structure FuncCallContext> is provided to help
+ control this process. Within a function, fcinfo->flinfo->fn_extra>
is used to hold a pointer to FuncCallContext>
across calls.
typedef struct
{
/*
- * Number of times we've been called before.
+ * Number of times we've been called before
*
* call_cntr is initialized to 0 for you by SRF_FIRSTCALL_INIT(), and
* incremented for you every time SRF_RETURN_NEXT() is called.
/*
* OPTIONAL maximum number of calls
*
- * max_calls is here for convenience ONLY and setting it is OPTIONAL .
+ * max_calls is here for convenience only and setting it is optional .
* If not set, you must provide alternative means to know when the
* function is done.
*/
/*
* OPTIONAL pointer to result slot
*
- * slot is for use when returning tuples (i.e. composite data types)
- * and is not needed when returning base (i.e. scalar) data types.
+ * slot is for use when returning tuples (i.e., composite data types)
+ * and is not needed when returning base data types.
*/
TupleTableSlot *slot;
/*
- * OPTIONAL pointer to misc user provided context info
+ * OPTIONAL pointer to miscellaneous user-provided context information
*
- * user_fctx is for use as a pointer to your own struct to retain
- * arbitrary context information between calls for your function.
+ * user_fctx is for use as a pointer to your own data to retain
+ * arbitrary context information between calls of your function.
*/
void *user_fctx;
/*
- * OPTIONAL pointer to struct containing arrays of attribute type input
- * metainfo
+ * OPTIONAL pointer to struct containing attribute type input metadata
*
- * attinmeta is for use when returning tuples (i.e. composite data types)
- * and is not needed when returning base (i.e. scalar) data types. It
- * is ONLY needed if you intend to use BuildTupleFromCStrings() to create
+ * attinmeta is for use when returning tuples (i.e., composite data types)
+ * and is not needed when returning base data types. It
+ * is only needed if you intend to use BuildTupleFromCStrings() to create
* the return tuple.
*/
AttInMetadata *attinmeta;
/*
- * memory context used for structures which must live for multiple calls
+ * memory context used for structures that must live for multiple calls
*
* multi_call_memory_ctx is set by SRF_FIRSTCALL_INIT() for you, and used
* by SRF_RETURN_DONE() for cleanup. It is the most appropriate memory
- * context for any memory that is to be re- used across multiple calls
+ * context for any memory that is to be reused across multiple calls
* of the SRF.
*/
MemoryContext multi_call_memory_ctx;
} FuncCallContext;
+
+
An
SRF> uses several functions and macros that
automatically manipulate the FuncCallContext>
structure (and expect to find it via fn_extra>). Use
SRF_RETURN_NEXT(funcctx, result)
- to return it to the caller. (The result> must be a
+ to return it to the caller. (result> must be of type
Datum>, either a single value or a tuple prepared as
- described earlier .) Finally, when your function is finished
+ described above .) Finally, when your function is finished
returning data, use
SRF_RETURN_DONE(funcctx)
The memory context that is current when the
SRF> is called is
a transient context that will be cleared between calls. This means
- that you do not need to pfree> everything
- you palloc>; it will go away anyway. However, if you want to allocate
+ that you do not need to call pfree> on everything
+ you allocated using palloc>; it will go away anyway. However, if you want to allocate
any data structures to live across calls, you need to put them somewhere
else. The memory context referenced by
multi_call_memory_ctx> is a suitable location for any
A complete pseudo-code example looks like the following:
Datum
-my_Set_Returning_F unction(PG_FUNCTION_ARGS)
+my_set_returning_f unction(PG_FUNCTION_ARGS)
{
FuncCallContext *funcctx;
Datum result;
MemoryContext oldcontext;
- [user defined declarations]
+ further declarations as needed
if (SRF_IS_FIRSTCALL())
{
funcctx = SRF_FIRSTCALL_INIT();
oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
- /* o ne-time setup code appears here: */
- [user defined code]
- [if returning composite]
- [build TupleDesc, and perhaps AttInMetadata]
- [obtain slot]
+ /* O ne-time setup code appears here: */
+ user code
+ if returning composite
+ build TupleDesc, and perhaps AttInMetadata
+ obtain slot
funcctx->slot = slot;
- [endif returning composite]
- [user defined code]
+ endif returning composite
+ user code
MemoryContextSwitchTo(oldcontext);
}
- /* e ach-time setup code appears here: */
- [user defined code]
+ /* E ach-time setup code appears here: */
+ user code
funcctx = SRF_PERCALL_SETUP();
- [user defined code]
+ user code
/* this is just one way we might test whether we are done: */
if (funcctx->call_cntr < funcctx->max_calls)
{
- /* h ere we want to return another item: */
- [user defined code]
- [obtain result Datum]
+ /* H ere we want to return another item: */
+ user code
+ obtain result Datum
SRF_RETURN_NEXT(funcctx, result);
}
else
{
- /* here we are done returning items, and just need to clean up: */
- [user defined code]
+ /* Here we are done returning items and just need to clean up: */
+ user code
SRF_RETURN_DONE(funcctx);
}
}
A complete example of a simple
SRF> returning a composite type looks like:
PG_FUNCTION_INFO_V1(testpassbyval);
+
Datum
testpassbyval(PG_FUNCTION_ARGS)
{
int call_cntr;
int max_calls;
TupleDesc tupdesc;
- TupleTableSlot *slot;
+ TupleTableSlot *slot;
AttInMetadata *attinmeta;
/* stuff done only on the first call of the function */
/* total number of tuples to be returned */
funcctx->max_calls = PG_GETARG_UINT32(0);
- /*
- * Build a tuple description for a __testpassbyval tuple
- */
+ /* Build a tuple description for a __testpassbyval tuple */
tupdesc = RelationNameGetTupleDesc("__testpassbyval");
/* allocate a slot for a tuple with this tupdesc */
funcctx->slot = slot;
/*
- * G enerate attribute metadata needed later to produce tuples from raw
+ * g enerate attribute metadata needed later to produce tuples from raw
* C strings
*/
attinmeta = TupleDescGetAttInMetadata(tupdesc);
/*
* Prepare a values array for storage in our slot.
* This should be an array of C strings which will
- * be processed later by the appropriate "in" functions.
+ * be processed later by the type input functions.
*/
values = (char **) palloc(3 * sizeof(char *));
values[0] = (char *) palloc(16 * sizeof(char));
/* make the tuple into a datum */
result = TupleGetDatum(slot, tuple);
- /* Clean up (this is not actu ally necessary) */
+ /* clean up (this is not re ally necessary) */
pfree(values[0]);
pfree(values[1]);
pfree(values[2]);
pfree(values);
- SRF_RETURN_NEXT(funcctx, result);
+ SRF_RETURN_NEXT(funcctx, result);
}
else /* do when there is no more left */
{
- SRF_RETURN_DONE(funcctx);
+ SRF_RETURN_DONE(funcctx);
}
}
- with supporting SQL code of
+
+ The SQL code to declare this function is:
-CREATE TYPE __testpassbyval AS (f1 int4, f2 int4, f3 int4 );
+CREATE TYPE __testpassbyval AS (f1 integer, f2 integer, f3 integer );
-CREATE OR REPLACE FUNCTION testpassbyval(int4, int4) RETURNS setof __testpassbyval
- AS 'MODULE_PATHNAME','testpassbyval' LANGUAGE 'c' IMMUTABLE STRICT;
+CREATE OR REPLACE FUNCTION testpassbyval(integer, integer) RETURNS SETOF __testpassbyval
+ AS 'filename>', 'testpassbyval'
+ LANGUAGE C IMMUTABLE STRICT;
- See contrib/tablefunc> for more examples of table functions.
-
-
-
-
-
-
-
-
Writing Code
-
- We now turn to the more difficult task of writing
- programming language functions. Be warned: this section
- of the manual will not make you a programmer. You must
- have a good understanding of
C
- (including the use of pointers)
- before trying to write
C functions for
- use with
PostgreSQL . While it may
- be possible to load functions written in languages other
- than
C into
PostgreSQL ,
- this is often difficult (when it is possible at all)
- because other languages, such as
FORTRAN
- and
Pascal often do not follow the same
- calling convention
- languages do not pass argument and return values
- between functions in the same way. For this reason, we
- will assume that your programming language functions
-
-
- The basic rules for building
C functions
- are as follows:
-
-
-
- Use
pg_config --includedir-server pg_config>> to find
- out where the
PostgreSQL> server header files are installed on
- your system (or the system that your users will be running
- on). This option is new with
PostgreSQL> 7.2.
- 7.1 you should use the option --includedir .
- (pg_config will exit with a non-zero status
- if it encounters an unknown option.) For releases prior to
- 7.1 you will have to guess, but since that was before the
- current calling conventions were introduced, it is unlikely
- that you want to support those releases.
-
-
-
-
- When allocating memory, use the
- palloc and pfree
- instead of the corresponding
C library
- routines malloc and
- free . The memory allocated by
- palloc will be freed automatically at the
- end of each transaction, preventing memory leaks.
-
-
-
-
- Always zero the bytes of your structures using
- memset or bzero .
- Several routines (such as the hash access method, hash join
- and the sort algorithm) compute functions of the raw bits
- contained in your structure. Even if you initialize all
- fields of your structure, there may be several bytes of
- alignment padding (holes in the structure) that may contain
- garbage values.
-
-
-
-
- Most of the internal
PostgreSQL types
- are declared in postgres.h , while the function
- manager interfaces (PG_FUNCTION_ARGS , etc.)
- are in fmgr.h , so you will need to
- include at least these two files. For portability reasons it's best
- to include postgres.h first>,
- before any other system or user header files.
- Including postgres.h will also include
- elog.h and palloc.h
- for you.
-
-
-
-
- Symbol names defined within object files must not conflict
- with each other or with symbols defined in the
-
PostgreSQL server executable. You
- will have to rename your functions or variables if you get
- error messages to this effect.
-
-
-
-
- Compiling and linking your object code so that
- it can be dynamically loaded into
- always requires special flags.
- See
- for a detailed explanation of how to do it for
- your particular operating system.
-
-
-
+ The directory contrib/tablefunc> in the source
+ distribution contains more examples of set-returning functions.
-
-&dfunc;
-
- A function may also have the same name as an attribute. In the case
- that there is an ambiguity between a function on a complex type and
- an attribute of the complex type, the attribute will always be used.
+ A function may also have the same name as an attribute. (Recall
+ that attribute(table) is equivalent to
+ table.attribute .) In the case that there is an
+ ambiguity between a function on a complex type and an attribute of
+ the complex type, the attribute will always be used.
- When overloading C language functions, there is an additional
+ When overloading C- language functions, there is an additional
constraint: The C name of each function in the family of
overloaded functions must be different from the C names of all
other functions, either internal or dynamically loaded. If this
The names of the C functions here reflect one of many possible conventions.
-
- Prior to
PostgreSQL 7.0, this
- alternative syntax did not exist. There is a trick to get around
- the problem, by defining a set of C functions with different names
- and then define a set of identically-named SQL function wrappers
- that take the appropriate argument types and call the matching C
- function.
-
-
-
-
-
Table Functions
-
-
- Table functions are functions that produce a set of rows, made up of
- either base (scalar) data types, or composite (multi-column) data types.
- They are used like a table, view, or subselect in the FROM>
- clause of a query. Columns returned by table functions may be included in
- SELECT>, JOIN>, or WHERE> clauses in the
- same manner as a table, view, or subselect column.
-
-
- If a table function returns a base data type, the single result column
- is named for the function. If the function returns a composite type, the
- result columns get the same names as the individual attributes of the type.
-
-
- A table function may be aliased in the FROM> clause, but it also
- may be left unaliased. If a function is used in the FROM clause with no
- alias, the function name is used as the relation name.
-
-
- Table functions work wherever tables do in SELECT> statements.
- For example
-CREATE TABLE foo (fooid int, foosubid int, fooname text);
-
-CREATE FUNCTION getfoo(int) RETURNS setof foo AS '
- SELECT * FROM foo WHERE fooid = $1;
-' LANGUAGE SQL;
-
-SELECT * FROM getfoo(1) AS t1;
-
-SELECT * FROM foo
-WHERE foosubid in (select foosubid from getfoo(foo.fooid) z
- where z.fooid = foo.fooid);
-
-CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);
-SELECT * FROM vw_getfoo;
-
- are all valid statements.
-
-
- In some cases it is useful to define table functions that can return
- different column sets depending on how they are invoked. To support this,
- the table function can be declared as returning the pseudo-type
- record>. When such a function is used in a query, the expected
- row structure must be specified in the query itself, so that the system
- can know how to parse and plan the query. Consider this example:
-SELECT *
-FROM dblink('dbname=template1', 'select proname, prosrc from pg_proc')
- AS t1(proname name, prosrc text)
-WHERE proname LIKE 'bytea%';
-
- The dblink> function executes a remote query (see
- contrib/dblink>). It is declared to return record>
- since it might be used for any kind of query. The actual column set
- must be specified in the calling query so that the parser knows, for
- example, what *> should expand to.
-
-
The call handler for a procedural language is a
- normal
function, which must be written in a
- compiled language such as C and registered with
-
PostgreSQL as taking no arguments and
- returning the language_handler type.
- This special pseudo-type identifies the handler as a call handler
- and prevents it from being called directly in querie s.
+ normal
function that must be written in a compiled
+ language such as C, using the version-1 interface, and registered
+ with
PostgreSQL as taking no arguments
+ and returning the type language_handler . This
+ special pseudotype identifies the function as a call handler and
+ prevents it from being called directly in SQL command s.
-
- In
PostgreSQL 7.1 and later, call
- handlers must adhere to the version 1
function
- manager interface, not the old-style interface.
-
-
-
The call handler is called in the same way as any other function:
It receives a pointer to a
is expected to return a Datum result (and possibly
set the isnull field of the
FunctionCallInfoData structure, if it wishes
- to return an SQL NULL result). The difference between a call
+ to return an SQL null result). The difference between a call
handler and an ordinary callee function is that the
flinfo->fn_oid field of the
FunctionCallInfoData structure will contain
- It's up to the call handler to fetch the
- pg_proc entry and to analyze the argument
- and return types of the called procedure. The AS clause from the
- CREATE FUNCTION of the procedure will be found
- in the prosrc attribute of the
- pg_proc table entry . This may be the source
+ It's up to the call handler to fetch the entry of the function from the system table
+ pg_proc and to analyze the argument
+ and return types of the called function. The AS> clause from the
+ CREATE FUNCTION of the function will be found
+ in the prosrc column of the
+ pg_proc row . This may be the source
text in the procedural language itself (like for PL/Tcl), a
path name to a file, or anything else that tells the call handler
what to do in detail.
A call handler can avoid repeated lookups of information about the
called function by using the
flinfo->fn_extra field. This will
- initially be NULL , but can be set by the call handler to point at
- information about the PL function. On subsequent calls, if
- flinfo->fn_extra is already non-NULL
+ initially be NULL> , but can be set by the call handler to point at
+ information about the called function. On subsequent calls, if
+ flinfo->fn_extra is already non-NULL>
then it can be used and the information lookup step skipped. The
- call handler must be careful that
+ call handler must make sure that
flinfo->fn_extra is made to point at
memory that will live at least until the end of the current query,
since an FmgrInfo data structure could be
flinfo->fn_mcxt ; such data will
normally have the same lifespan as the
FmgrInfo itself. But the handler could
- also choose to use a longer-lived context so that it can cache
+ also choose to use a longer-lived memory context so that it can cache
function definition information across queries.
- When a PL function is invoked as a trigger, no explicit arguments
- are passed, but the
+ When a procedural-language function is invoked as a trigger, no arguments
+ are passed in the usual way , but the
FunctionCallInfoData 's
context field points at a
- TriggerData node, rather than being NULL
+ TriggerData structure, rather than being NULL>
as it is in a plain function call. A language handler should
- provide mechanisms for PL functions to get at the trigger
+ provide mechanisms for procedural-language functions to get at the trigger
information.
- This is a template for a PL handler written in C:
+ This is a template for a procedural-language handler written in C:
#include "postgres.h"
#include "executor/spi.h"
retval = ...
}
- else {
+ else
+ {
/*
* Called as a function
*/
return retval;
}
-
-
Only a few thousand lines of code have to be added instead of the
- dots to complete the call handler. See
- for information on how to compile it into a loadable module.
+ dots to complete the call handler.
- The following commands then register the sample procedural
- language:
+ After having compiled the handler function into a loadable module
+ (see ), the following commands then
+ register the sample procedural language:
-CREATE FUNCTION plsample_call_handler () RETURNS language_handler
- AS '/usr/local/pgsql/lib/plsample '
+CREATE FUNCTION plsample_call_handler() RETURNS language_handler
+ AS 'filename '
LANGUAGE C;
CREATE LANGUAGE plsample
HANDLER plsample_call_handler;
-
-
-
Extending SQL : Operators
-
-
-
Introduction
-
-
PostgreSQL supports left unary,
- right unary, and binary
- operators. Operators can be overloaded; that is,
- the same operator name can be used for different operators
- that have different numbers and types of operands. If
- there is an ambiguous situation and the system cannot
- determine the correct operator to use, it will return
- an error. You may have to type-cast the left and/or
- right operands to help it understand which operator you
- meant to use.
-
+
+
User-defined Operators
Every operator is syntactic sugar
for a call to an
the operator. However, an operator is not merely
syntactic sugar, because it carries additional information
that helps the query planner optimize queries that use the
- operator. Much of this chapter will be devoted to explaining
+ operator. The next section will be devoted to explaining
that additional information.
-
-
-
Example
+
PostgreSQL supports left unary, right
+ unary, and binary operators. Operators can be overloaded; that is,
+ the same operator name can be used for different operators that
+ have different numbers and types of operands. When a query is
+ executed, the system determines the operator to call from the
+ number and types of the provided operands.
+
Here is an example of creating an operator for adding two complex
CREATE FUNCTION complex_add(complex, complex)
RETURNS complex
- AS 'PGROOT /tutorial/complex'
+ AS 'filename ', 'complex_add'
LANGUAGE C;
CREATE OPERATOR + (
- Now we can do :
+ Now we could execute a query like this :
SELECT (a + b) AS c FROM test_complex;
CREATE OPERATOR . The commutator>
clause shown in the example is an optional hint to the query
optimizer. Further details about commutator> and other
- optimizer hints appear below .
+ optimizer hints appear in the next section .
Operator Optimization Information
-
-
Author
- Written by Tom Lane.
-
-
-
A
PostgreSQL operator definition can include
several optional clauses that tell the system useful things about how
appropriate, because they can make for considerable speedups in execution
of queries that use the operator. But if you provide them, you must be
sure that they are right! Incorrect use of an optimization clause can
- result in backend crashes, subtly wrong output, or other Bad Things.
+ result in server process crashes, subtly wrong output, or other Bad Things.
You can always leave out an optimization clause if you are not sure
about it; the only consequence is that queries might run slower than
they need to.
-
COMMUTATOR
+
COMMUTATOR>
The COMMUTATOR> clause, if provided, names an operator that is the
The other, more straightforward way is just to include COMMUTATOR> clauses
in both definitions. When
PostgreSQL processes
- the first definition and realizes that COMMUTATOR> refers to a non- existent
+ the first definition and realizes that COMMUTATOR> refers to a nonexistent
operator, the system will make a dummy entry for that operator in the
system catalog. This dummy entry will have valid data only
for the operator name, left and right operand types, and result type,
dummy entry. Later, when you define the second operator, the system
updates the dummy entry with the additional information from the second
definition. If you try to use the dummy operator before it's been filled
- in, you'll just get an error message. (Note: This procedure did not work
- reliably in
PostgreSQL versions before 6.5,
- but it is now the recommended way to do things.)
+ in, you'll just get an error message.
-
NEGATOR
+
NEGATOR>
The NEGATOR> clause, if provided, names an operator that is the
An operator's negator must have the same left and/or right operand types
- as the operator itself , so just as with COMMUTATOR>, only the operator
+ as the operator to be defined , so just as with COMMUTATOR>, only the operator
name need be given in the NEGATOR> clause.
Providing a negator is very helpful to the query optimizer since
it allows expressions like NOT (x = y)> to be simplified into
- x <> y . This comes up more often than you might think, because
+ x <> y> . This comes up more often than you might think, because
NOT> operations can be inserted as a consequence of other rearrangements.
-
RESTRICT
+
RESTRICT>
The RESTRICT> clause, if provided, names a restriction selectivity
- estimation function for the operator (n ote that this is a function
- name, not an operator name). RESTRICT> clauses only make sense for
+ estimation function for the operator. (N ote that this is a function
+ name, not an operator name.) RESTRICT> clauses only make sense for
binary operators that return boolean>. The idea behind a restriction
selectivity estimator is to guess what fraction of the rows in a
table will satisfy a WHERE -clause condition of the form
You can use scalarltsel> and scalargtsel> for comparisons on data types that
have some sensible means of being converted into numeric scalars for
range comparisons. If possible, add the data type to those understood
- by the routine convert_to_scalar() in src/backend/utils/adt/selfuncs.c .
- (Eventually, this routine should be replaced by per-data-type functions
+ by the function convert_to_scalar() in src/backend/utils/adt/selfuncs.c .
+ (Eventually, this function should be replaced by per-data-type functions
identified through a column of the pg_type> system catalog; but that hasn't happened
yet.) If you do not do this, things will still work, but the optimizer's
estimates won't be as good as they could be.
- There are additional selectivity functions designed for geometric
+ There are additional selectivity estimation functions designed for geometric
operators in src/backend/utils/adt/geo_selfuncs.c : areasel , positionsel ,
and contsel . At this writing these are just stubs, but you may want
to use them (or even better, improve them) anyway.
-
JOIN
+
JOIN>
The JOIN> clause, if provided, names a join selectivity
- estimation function for the operator (n ote that this is a function
- name, not an operator name). JOIN> clauses only make sense for
+ estimation function for the operator. (N ote that this is a function
+ name, not an operator name.) JOIN> clauses only make sense for
binary operators that return boolean . The idea behind a join
selectivity estimator is to guess what fraction of the rows in a
pair of tables will satisfy a WHERE>-clause condition of the form
-
HASHES
+
HASHES>
The HASHES clause, if present, tells the system that
it is permissible to use the hash join method for a join based on this
- operator. HASHES> only makes sense for binary operators that
- return boolean>, and in practice the operator had better be
+ operator. HASHES> only makes sense for a binary operator that
+ returns boolean>, and in practice the operator had better be
equality for some data type.
In fact, logical equality is not good enough either; the operator
- had better represent pure bitwise equality, because the hash function
- will be computed on the memory representation of the values regardless
- of what the bits mean. For example, equality of
- time intervals is not bitwise equality; the interval equality operator
- considers two time intervals equal if they have the same
- duration, whether or not their endpoints are identical. What this means
- is that a join using = between interval fields would yield different
- results if implemented as a hash join than if implemented another way,
- because a large fraction of the pairs that should match will hash to
- different values and will never be compared by the hash join. But
- if the optimizer chose to use a different kind of join, all the pairs
- that the equality operator says are equal will be found.
- We don't want that kind of inconsistency, so we don't mark interval
- equality as hashable.
+ had better represent pure bitwise equality, because the hash
+ function will be computed on the memory representation of the
+ values regardless of what the bits mean. For example, the
+ polygon operator ~= , which checks whether two
+ polygons are the same, is not bitwise equality, because two
+ polygons can be considered the same even if their vertices are
+ specified in a different order. What this means is that a join
+ using ~= between polygon fields would yield
+ different results if implemented as a hash join than if
+ implemented another way, because a large fraction of the pairs
+ that should match will hash to different values and will never be
+ compared by the hash join. But if the optimizer chooses to use a
+ different kind of join, all the pairs that the operator
+ ~= says are the same will be found. We don't
+ want that kind of inconsistency, so we don't mark the polygon
+ operator ~= as hashable.
There are also machine-dependent ways in which a hash join might fail
to do the right thing. For example, if your data type
is a structure in which there may be uninteresting pad bits, it's unsafe
- to mark the equality operator HASHES>. (Unless, perhaps, you write
- your other operators to ensure that the unused bits are always zero .)
+ to mark the equality operator HASHES>. (Unless you write
+ your other operators and functions to ensure that the unused bits are always zero, which is the recommended strategy .)
Another example is that the floating-point data types are unsafe for hash
- joins. On machines that meet the
IEEE> floating-point standard, minus
- zero and plus zero are different values (different bit patterns) but
+ joins. On machines that meet the
IEEE> floating-point standard, negative
+ zero and positive zero are different values (different bit patterns) but
they are defined to compare equal. So, if the equality operator on floating-point data types were marked
- HASHES>, a minus zero and a plus zero would probably not be matched up
+ HASHES>, a negative zero and a positive zero would probably not be matched up
by a hash join, but they would be matched up by any other join process.
The MERGES clause, if present, tells the system that
- it is permissible to use the merge join method for a join based on this
- operator. MERGES> only makes sense for binary operators that
- return boolean>, and in practice the operator must represent
+ it is permissible to use the merge- join method for a join based on this
+ operator. MERGES> only makes sense for a binary operator that
+ returns boolean>, and in practice the operator must represent
equality for some data type or pair of data types.
data types had better be the same (or at least bitwise equivalent),
it is possible to merge-join two
distinct data types so long as they are logically compatible. For
- example, the int2 -versus-int4 equality operator
+ example, the smallint -versus-integer equality operator
is merge-joinable.
We only need sorting operators that will bring both data types into a
logically compatible sequence.
Execution of a merge join requires that the system be able to identify
four operators related to the merge-join equality operator: less-than
- comparison for the left input data type, less-than comparison for the
- right input data type, less-than comparison between the two data types, and
+ comparison for the left operand data type, less-than comparison for the
+ right operand data type, less-than comparison between the two data types, and
greater-than comparison between the two data types. (These are actually
four distinct operators if the merge-joinable operator has two different
- input data types; but when the input types are the same the three
+ operand data types; but when the operand types are the same the three
less-than operators are all the same operator.)
It is possible to
specify these operators individually by name, as the SORT1>,
- The input data types of the four comparison operators can be deduced
- from the input types of the merge-joinable operator, so just as with
+ The operand data types of the four comparison operators can be deduced
+ from the operand types of the merge-joinable operator, so just as with
COMMUTATOR>, only the operator names need be given in these
clauses. Unless you are using peculiar choices of operator names,
it's sufficient to write MERGES> and let the system fill in
A merge-joinable equality operator must have a merge-joinable
- commutator (itself if the two data types are the same, or a related
+ commutator (itself if the two operand data types are the same, or a related
equality operator if they are different).
<> and >> respectively.
-
-
-
+
+
+
User-Defined Types
- This chapter needs to be updated for the version-1 function manager
+ This section needs to be updated for the version-1 function manager
interface.
- As previously mentioned, there are two kinds of types in
-
PostgreSQL : base types (defined in a
- programming language) and composite types. This chapter describes
- how to define new base types.
+ As described above, there are two kinds of data types in
+
PostgreSQL : base types and composite
+ types. This section describes how to define new base types.
The examples in this section can be found in
complex.sql and complex.c
- in the tutorial directory. Composite examples are in
- funcs.sql .
+ in the tutorial directory.
These functions determine how the type appears in strings (for input
by the user and output to the user) and how the type is organized in
memory. The input function takes a null-terminated character string
- as its inpu t and returns the internal (in memory) representation of
+ as its argumen t and returns the internal (in memory) representation of
the type. The output function takes the internal representation of
- the type and returns a null-terminated character string.
+ the type as argument a nd returns a null-terminated character string.
- Suppose we want to define a complex type which represents complex
- numbers. Naturally, we would choose to represent a complex in memory
-
as the following C structure:
+ Suppose we want to define a type complex> that represents
+ complex numbers. A natural way to to represent a complex number in
+ memory would be the following C structure:
typedef struct Complex {
} Complex;
- and a string of the form (x,y) as the external string
- representation .
+ As the external string representation of the type, we choose a
+ string of the form (x,y) .
- The functions are usually not hard to write, especially the output
- function. However, there are a number of points to remember:
-
-
-
- When defining your external (string) representation, remember
- that you must eventually write a complete and robust parser for
- that representation as your input function!
-
-
- For instance:
+ The input and output functions are usually not hard to write,
+ especially the output function. But when defining the external
+ string representation of the type, remember that you must eventually
+ write a complete and robust parser for that representation as your
+ input function. For instance:
Complex *
{
double x, y;
Complex *result;
- if (sscanf(str, " ( %lf , %lf )", &x, &y) != 2) {
+
+ if (sscanf(str, " ( %lf , %lf )", &x, &y) != 2)
+ {
elog(ERROR, "complex_in: error in parsing %s", str);
return NULL;
}
- result = (Complex *)palloc(sizeof(Complex));
+ result = (Complex *) palloc(sizeof(Complex));
result->x = x;
result->y = y;
- return (result) ;
+ return result ;
}
-
- The output function can simply be:
+ The output function can simply be:
char *
complex_out(Complex *complex)
{
char *result;
+
if (complex == NULL)
return(NULL);
result = (char *) palloc(60);
sprintf(result, "(%g,%g)", complex->x, complex->y);
- return(result) ;
+ return result ;
}
+
-
-
-
-
- You should try to make the input and output functions inverses of
- each other. If you do not, you will have severe problems when
- you need to dump your data into a file and then read it back in
- (say, into someone else's database on another computer). This is
- a particularly common problem when floating-point numbers are
- involved.
-
-
-
+ You should try to make the input and output functions inverses of
+ each other. If you do not, you will have severe problems when you
+ need to dump your data into a file and then read it back in. This
+ is a particularly common problem when floating-point numbers are
+ involved.
CREATE FUNCTION complex_in(cstring)
RETURNS complex
- AS 'PGROOT /tutorial/complex'
+ AS 'filename '
LANGUAGE C;
CREATE FUNCTION complex_out(complex)
RETURNS cstring
- AS 'PGROOT /tutorial/complex'
+ AS 'filename '
LANGUAGE C;
+
+ Notice that the declarations of the input and output functions must
+ reference the not-yet-defined type. This is allowed, but will draw
+ warning messages that may be ignored.
output = complex_out
);
-
- Notice that the declarations of the input and output functions must
- reference the not-yet-defined type. This is allowed, but will draw
- warning messages that may be ignored.
-
-
- As discussed earlier,
PostgreSQL fully
- supports arrays of base types. Additionally,
-
PostgreSQL supports arrays of
- user-defined types as well. When you define a type,
+ When you define a new base type,
PostgreSQL automatically provides support
- for arrays of that type. For historical reasons, the array type has
- the same name as the user-defined type with the underscore character
- _> prepended.
+ for arrays of that
+ type.
array of user-defined
+ type For historical reasons, the array type
+ has the same name as the base type with the underscore character
+ (_>) prepended.
- Composite types do not need any function defined on them, since the
- system already understands what they look like inside.
+ If the values of your data type might exceed a few hundred bytes in
+ size (in internal form), you should mark them
+ user-defined types To do this, the internal
+ representation must follow the standard layout for variable-length
+ data: the first four bytes must be an int32 containing
+ the total length in bytes of the datum (including itself). Also,
+ when running the CREATE TYPE command, specify the
+ internal length as variable> and select the appropriate
+ storage option.
-
- and user-defined types
-
- If the values of your data type might exceed a few hundred bytes in
- size (in internal form), you should be careful to mark them
- TOAST-able. To do this, the internal representation must follow the
- standard layout for variable-length data: the first four bytes must
- be an int32 containing the total length in bytes of the
- datum (including itself). Then, all your functions that accept
- values of the type must be careful to call
- pg_detoast_datum() on the supplied values ---
- after checking that the value is not NULL, if your function is not
- strict. Finally, select the appropriate storage option when giving
- the CREATE TYPE command.
+ For further details see the description of the CREATE
+ TYPE command in .
-chapter >
+sect1 >