In many cases a user will not need
to understand the details of the type conversion mechanism.
However, the implicit conversions done by
Postgres
-can affect the apparent results of a query, and these results
+can affect the results of a query. When necessary, these results
can be tailored by a user or programmer
using explicit type coercion.
This chapter introduces the
Postgres
- type conversion mechanisms and conventions.
+type conversion mechanisms and conventions.
Refer to the relevant sections in the User's Guide and Programmer's Guide
-for more information on specific data types and allowed functions and operators.
+for more information on specific data types and allowed functions and
+operators.
-The
Postgres scanner/parser decodes lexical elements
-into only five fundamental categories: integers, floats, strings, names, and keywords.
-Most extended types are first tokenized into strings. The
SQL
-language definition allows specifying type names with strings, and this mechanism
-to start the parser down the correct path. For example, the query
+The
Postgres scanner/parser decodes lexical
+elements into only five fundamental categories: integers, floats, strings,
+names, and keywords. Most extended types are first tokenized into
+strings. The
SQL language definition allows specifying type
+names with strings, and this mechanism can be used in
+
Postgres to start the parser down the correct
+path. For example, the query
tgl=> SELECT text 'Origin' AS "Label", point '(0,0)' AS "Value";
has two strings, of type text and point.
-If a type is not specified, then the placeholder type unknown
-is assigned initially, to be resolved in later stages as described below.
+If a type is not specified for a string, then the placeholder type
+unknown is assigned initially, to be resolved in later
+stages as described below.
-Much of the
Postgres type system is built around a rich set of
-functions. Function calls have one or more arguments which, for any specific query,
-must be matched to the functions available in the system catalog.
+Much of the
Postgres type system is built around a
+rich set of functions. Function calls have one or more arguments which, for
+any specific query, must be matched to the functions available in the system
+catalog. Since
Postgres permits function
+overloading, the function name alone does not uniquely identify the function
+to be called --- the parser must select the right function based on the data
+types of the supplied arguments.
-
SQL INSERT statements place the results of query into a table. The expressions
-in the query must be matched up with, and perhaps converted to, the target columns of the insert.
+
SQL INSERT and UPDATE statements place the results of
+expressions into a table. The expressions in the query must be matched up
+with, and perhaps converted to, the types of the target columns.
-UNION queries
+UNION and CASE constructs
-Since all select results from a UNION SELECT statement must appear in a single set of columns, the types
+Since all select results from a UNION SELECT statement must appear in a single
+set of columns, the types of the results
of each SELECT clause must be matched up and converted to a uniform set.
+Similarly, the result expressions of a CASE construct must be coerced to
+a common type so that the CASE expression as a whole has a known output type.
The
Postgres parser uses the convention that all
type conversion functions take a single argument of the source type and are
-named with the same name as the target type. Any function meeting this
+named with the same name as the target type. Any function meeting these
criteria is considered to be a valid conversion function, and may be used
by the parser as such. This simple assumption gives the parser the power
to explore type conversion possibilities without hardcoding, allowing
An additional heuristic is provided in the parser to allow better guesses
at proper behavior for
SQL standard types. There are
-five categories of types defined: boolean, string, numeric, geometric,
+several basic type categories defined: boolean,
+numeric, string, bitstring, datetime, timespan, geometric, network,
and user-defined. Each category, with the exception of user-defined, has
-a "preferred type" which is used to resolve ambiguities in candidates.
-Each "user-defined" type is its own "preferred type", so ambiguous
-expressions (those with multiple candidate parsing solutions)
-with only one user-defined type can resolve to a single best choice, while those with
-multiple user-defined types will remain ambiguous and throw an error.
-
-
-Ambiguous expressions which have candidate solutions within only one type category are
-likely to resolve, while ambiguous expressions with candidates spanning multiple
-categories are likely to throw an error and ask for clarification from the user.
+a preferred type which is preferentially selected
+when there is ambiguity.
+In the user-defined category, each type is its own preferred type.
+Ambiguous expressions (those with multiple candidate parsing solutions)
+can often be resolved when there are multiple possible built-in types, but
+they will raise an error when there are multiple choices for user-defined
+types.
Operators
-
-
Conversion Procedure
-
-
Operator Evaluation
-
+
Operator Type Resolution
-If one argument of a binary operator is unknown,
-then assume it is the same type as the other argument.
-
-
-
-Reverse the arguments, and look for an exact match with an operator which
-points to itself as being commutative.
-If found, then reverse the arguments in the parse tree and use this operator.
+If one argument of a binary operator is unknown type,
+then assume it is the same type as the other argument for this check.
+Other cases involving unknown will never find a match at
+this step.
Look for the best match.
-optional">
+required">
-Make a list of all operators of the same name.
+Make a list of all operators of the same name for which the input types
+match or can be coerced to match. (unknown literals are
+assumed to be coercible to anything for this purpose.) If there is only
+one, use it; else continue to the next step.
-If only one operator is in the list, use it if the input type can be coerced,
-and throw an error if the type cannot be coerced.
+Run through all candidates and keep those with the most exact matches
+on input types. Keep all candidates if none have any exact matches.
+If only one candidate remains, use it; else continue to the next step.
+
+
+Run through all candidates and keep those with the most exact or
+binary-compatible matches on input types. Keep all candidates if none have
+any exact or binary-compatible matches.
+If only one candidate remains, use it; else continue to the next step.
-Keep all operators with the most explicit matches for types. Keep all if there
-are no explicit matches and move to the next step.
-If only one candidate remains, use it if the type can be coerced.
+Run through all candidates and keep those which accept preferred types at
+the most positions where type coercion will be required.
+Keep all candidates if none accept preferred types.
+If only one candidate remains, use it; else continue to the next step.
-If any input arguments are "unknown", categorize the input candidates as
-boolean, numeric, string, geometric, or user-defined. If there is a mix of
-categories, or more than one user-defined type, throw an error because
-the correct choice cannot be deduced without more clues.
-If only one category is present, then assign the "preferred type"
-to the input column which had been previously "unknown".
+If any input arguments are "unknown", check the type categories accepted
+at those argument positions by the remaining candidates. At each position,
+select "string"
+category if any candidate accepts that category (this bias towards string
+is appropriate since an unknown-type literal does look like a string).
+Otherwise, if all the remaining candidates accept the same type category,
+select that category; otherwise raise an error because
+the correct choice cannot be deduced without more clues. Also note whether
+any of the candidates accept a preferred datatype within the selected category.
+Now discard operator candidates that do not accept the selected type category;
+furthermore, if any candidate accepts a preferred type at a given argument
+position, discard candidates that accept non-preferred types for that
+argument.
-Choose the candidate with the most exact type matches, and which matches
-the "preferred type" for each column category from the previous step.
-If there is still more than one candidate, or if there are none,
-then throw an error.
+If only one candidate remains, use it. If no candidate or more than one
+candidate remains,
+then raise an error.
-
Examples
In this case there is no initial hint for which type to use, since no types
are specified in the query. So, the parser looks for all candidate operators
-and finds that all arguments for all the candidates are string types. It chooses
-the "preferred type" for strings, text, for this query.
-
-
-
-If a user defines a new type and defines an operator "||" to work
-with it, then this query would no longer succeed as written. The parser would
-now have candidate types from two categories, and could not decide which to use.
+and finds that there are candidates accepting both string-category and
+bitstring-category inputs. Since string category is preferred when available,
+that category is selected, and then the
+"preferred type" for strings, text, is used as the specific
+type to resolve the unknown literals to.
-
Functions
-
Function Evaluation
+
Function Call Type Resolution
Check for an exact match in the pg_proc system catalog.
+(Cases involving unknown will never find a match at
+this step.)
-Make a list of all functions of the same name with the same number of arguments.
-
+Make a list of all functions of the same name with the same number of
+arguments for which the input types
+match or can be coerced to match. (unknown literals are
+assumed to be coercible to anything for this purpose.) If there is only
+one, use it; else continue to the next step.
+
+
-If only one function is in the list, use it if the input types can be coerced,
-and throw an error if the types cannot be coerced.
-
+Run through all candidates and keep those with the most exact matches
+on input types. Keep all candidates if none have any exact matches.
+If only one candidate remains, use it; else continue to the next step.
+
-Keep all functions with the most explicit matches for types. Keep all if there
-are no explicit matches and move to the next step.
-If only one candidate remains, use it if the type can be coerced.
-
+Run through all candidates and keep those with the most exact or
+binary-compatible matches on input types. Keep all candidates if none have
+any exact or binary-compatible matches.
+If only one candidate remains, use it; else continue to the next step.
+
+
-If any input arguments are "unknown", categorize the input candidate arguments as
-boolean, numeric, string, geometric, or user-defined. If there is a mix of
-categories, or more than one user-defined type, throw an error because
-the correct choice cannot be deduced without more clues.
-If only one category is present, then assign the "preferred type"
-to the input column which had been previously "unknown".
-
+Run through all candidates and keep those which accept preferred types at
+the most positions where type coercion will be required.
+Keep all candidates if none accept preferred types.
+If only one candidate remains, use it; else continue to the next step.
+
+
-Choose the candidate with the most exact type matches, and which matches
-the "preferred type" for each column category from the previous step.
-If there is still more than one candidate, or if there are none,
-then throw an error.
-
+If any input arguments are "unknown", check the type categories accepted
+at those argument positions by the remaining candidates. At each position,
+select "string"
+category if any candidate accepts that category (this bias towards string
+is appropriate since an unknown-type literal does look like a string).
+Otherwise, if all the remaining candidates accept the same type category,
+select that category; otherwise raise an error because
+the correct choice cannot be deduced without more clues. Also note whether
+any of the candidates accept a preferred datatype within the selected category.
+Now discard operator candidates that do not accept the selected type category;
+furthermore, if any candidate accepts a preferred type at a given argument
+position, discard candidates that accept non-preferred types for that
+argument.
+
+
+
+If only one candidate remains, use it. If no candidate or more than one
+candidate remains,
+then raise an error.
+
+
+
Examples
-There are some heuristics in the parser to optimize the relationship between the
-char, varchar, and text types.
-For this case, substr is called directly with the varchar string
-rather than inserting an explicit conversion call.
+Actually, the parser is aware that text and varchar
+are "binary compatible", meaning that one can be passed to a function that
+accepts the other without doing any physical conversion. Therefore, no
+explicit type conversion call is really inserted in this case.
34
(1 row)
+This succeeds because there is a conversion function text(int4) in the
+system catalog.
Query Targets
-
Target Evaluation
+
Query Target Type Resolution
-Try to coerce the expression directly to the target type if necessary.
+Otherwise, try to coerce the expression to the target type. This will succeed
+if the two types are known binary-compatible, or if there is a conversion
+function. If the expression is an unknown-type literal, the contents of
+the literal string will be fed to the input conversion routine for the target
+type.
If the target is a fixed-length type (e.g. char or varchar
-declared with a length) then try to find a sizing function of the same name
-as the type taking two arguments, the first the type name and the second an
-integer length.
+declared with a length) then try to find a sizing function for the target
+type. A sizing function is a function of the same name as the type,
+taking two arguments of which the first is that type and the second is an
+integer, and returning the same type. If one is found, it is applied,
+passing the column's declared length as the second parameter.
v
------
abcd
-(1 row)
+(1 row)
+
+What's really happened here is that the two unknown literals are resolved
+to text by default, allowing the || operator to be
+resolved as text concatenation. Then the text result of the operator
+is coerced to varchar to match the target column type. (But, since the
+parser knows that text and varchar are binary-compatible, this coercion
+is implicit and does not insert any real function call.) Finally, the
+sizing function varchar(varchar,int4) is found in the system
+catalogs and applied to the operator's result and the stored column length.
+This type-specific function performs the desired truncation.
-
-
UNION Queries
+-case">
+
UNION and CASE Constructs
-The UNION construct is somewhat different in that it must match up
-possibly dissimilar types to become a single result set.
+The UNION and CASE constructs must match up possibly dissimilar types to
+become a single result set. The resolution algorithm is applied separately to
+each output column of a UNION. CASE uses the identical algorithm to match
+up its result expressions.
-
UNION Evaluation
+
UNION and CASE Type Resolution
+
+
+If all inputs are of type unknown, resolve as type
+text (the preferred type for string category).
+Otherwise, ignore the unknown inputs while choosing the type.
+
+
+
+If the non-unknown inputs are not all of the same type category, raise an
+error.
+
-Check for identical types for all results.
+If one or more non-unknown inputs are of a preferred type in that category,
+resolve as that type.
-Coerce each result from the UNION clauses to match the type of the
-first SELECT clause or the target column.
+Otherwise, resolve as the type of the first non-unknown input.
+
+
+
+Coerce all inputs to the selected type.
b
(2 rows)
+Here, the unknown-type literal 'b' will be resolved as type text.
Transposed UNION
-The types of the union are forced to match the types of
+Here the output type of the union is forced to match the type of
the first/top clause in the union:
tgl=> SELECT 1 AS "All integers"
-tgl-> UNION SELECT '2.2'::float4
-tgl-> UNION SELECT 3.3;
+tgl-> UNION SELECT '2.2'::float4;
All integers
--------------
1
2
- 3
-(3 rows)
+(2 rows)
-An alternate parser strategy could be to choose the "best" type of the bunch, but
-this is more difficult because of the nice recursion technique used in the
-parser. However, the "best" type is used when selecting into
-a table:
-
-tgl=> CREATE TABLE ff (f float);
-CREATE
-tgl=> INSERT INTO ff
-tgl-> SELECT 1
-tgl-> UNION SELECT '2.2'::float4
-tgl-> UNION SELECT 3.3;
-INSERT 0 3
-tgl=> SELECT f AS "Floating point" from ff;
- Floating point
-------------------
- 1
- 2.20000004768372
- 3.3
-(3 rows)
-
+Since float4 is not a preferred type, the parser sees no reason to select it
+over int4, and instead falls back on the use-the-first-alternative rule.
+This example demonstrates that the preferred-type mechanism doesn't encode
+as much information as we'd like. Future versions of
+
Postgres may support a more general notion of
+type preferences.