A collation object provided by libc maps to a
combination of LC_COLLATE and LC_CTYPE
- settings. (As
+ settings, as accepted by the setlocale() system library call . (As
the name would suggest, the main purpose of a collation is to set
LC_COLLATE , which controls the sort order. But
it is rarely necessary in practice to have an
ICU collations
- Collations provided by ICU are created with names in BCP 47 language tag
+ With ICU, it is not sensible to enumerate all possible locale names. ICU
+ uses a particular naming system for locales, but there are many more ways
+ to name a locale than there are actually distinct locales.
+ initdb uses the ICU APIs to extract a set of distinct
+ locales to populate the initial set of collations. Collations provided by
+ ICU are created in the SQL environment with names in BCP 47 language tag
format, with a private use
extension -x-icu appended, to distinguish them from
- libc locales. So de-x-icu would be an example name.
+ libc locales.
- With ICU, it is not sensible to enumerate all possible locale names. ICU
- uses a particular naming system for locales, but there are many more ways
- to name a locale than there are actually distinct locales. (In fact, any
- string will be accepted as a locale name.)
- information on ICU locale naming. initdb uses the ICU
- APIs to extract a set of distinct locales to populate the initial set of
- collations. Here are some example collations that might be created:
+ Here are some example collations that might be created:
will draw an error along the lines of collation "de-x-icu" for
encoding "WIN874" does not exist>.
+
+
+
+
+
Creating New Collation Objects
+
+ If the standard and predefined collations are not sufficient, users can
+ create their own collation objects using the SQL
+ command .
+
+
+ The standard and predefined collations are in the
+ schema pg_catalog , like all predefined objects.
+ User-defined collations should be created in user schemas. This also
+ ensures that they are saved by pg_dump .
+
+
+
+
libc collations
+
+ New libc collations can be created like this:
+CREATE COLLATION german (provider = libc, locale = 'de_DE');
+
+ The exact values that are acceptable for the locale
+ clause in this command depend on the operating system. On Unix-like
+ systems, the command locale -a will show a list.
+
+
+ Since the predefined libc collations already include all collations
+ defined in the operating system when the database instance is
+ initialized, it is not often necessary to manually create new ones.
+ Reasons might be if a different naming system is desired (in which case
+ see also ) or if the operating system has
+ been upgraded to provide new locale definitions (in which case see
+ also pg_import_system_collations() ).
+
+
+
+
+
ICU collations
ICU allows collations to be customized beyond the basic language+country
set that is preloaded by initdb . Users are encouraged
to define their own collation objects that make use of these facilities to
- suit the sorting behavior to their requirements. Here are some examples:
+ suit the sorting behavior to their requirements.
+ information on ICU locale naming. The set of acceptable names and
+ attributes depends on the particular ICU version.
+
+
+ Here are some examples:
- CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk')
+ CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');
+ CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de@collation=phonebook');
German collation with phone book collation type
+ The first example selects the ICU locale using a language
+ tag per BCP 47. The second example uses the traditional
+ ICU-specific locale syntax. The first style is preferred going
+ forward, but it is not supported by older ICU versions.
+
+ Note that you can name the collation objects in the SQL environment
+ anything you want. In this example, we follow the naming style that
+ the predefined collations use, which in turn also follow BCP 47, but
+ that is not required for user-defined collations.
+
- CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji')
+ CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');
+ CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = '@collation=emoji');
Root collation with Emoji collation type, per Unicode Technical Standard #51
+ Observe how in the traditional ICU locale naming system, the root
+ locale is selected by an empty string.
+
- CREATE COLLATION digitslast (provider = icu, locale = 'en-u-kr-latn-digit')
+ CREATE COLLATION digitslast (provider = icu, locale = 'en-u-kr-latn-digit');
+ CREATE COLLATION digitslast (provider = icu, locale = 'en@colReorder=latn-digit');
Sort digits after Latin letters. (The default is digits before letters.)
- CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper')
+ CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');
+ CREATE COLLATION upperfirst (provider = icu, locale = 'en@colCaseFirst=upper');
Sort upper-case letters before lower-case letters. (The default is
- CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-latn-digit')
+ CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-latn-digit');
+ CREATE COLLATION special (provider = icu, locale = 'en@colCaseFirst=upper;colReorder=latn-digit');
Combines both of the above options.
- CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true')
+ CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true');
+ CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');
Numeric ordering, sorts sequences of digits by their numeric value,
repository.
Explorer can be used to check the details of a particular locale
- definition.
+ definition. The examples using the k* subtags require
+ at least ICU version 54.
strings that compare equal according to the collation but are not
byte-wise equal will be sorted according to their byte values.
+
+
+ By design, ICU will accept almost any string as a locale name and match
+ it to the closet locale it can provide, using the fallback procedure
+ described in its documentation. Thus, there will be no direct feedback
+ if a collation specification is composed using features that the given
+ ICU installation does not actually support. It is therefore recommended
+ to create application-level test cases to check that the collation
+ definitions satisfy one's requirements.
+
+
-
- 3>
+ 4 id="collation-copy">
Copying Collations
CREATE COLLATION french FROM "fr-x-icu";
-
- The standard and predefined collations are in the
- schema pg_catalog , like all predefined objects.
- User-defined collations should be created in user schemas. This also
- ensures that they are saved by pg_dump .
-
+