-
+
Localization>
cultural preferences regarding alphabets, sorting, number
formatting, etc.
PostgreSQL> uses the standard ISO
C and POSIX-like locale facilities provided by the server operating
- system. For additional information refer the documentation of your
+ system. For additional information refer to the documentation of your
system.
Overview>
- Locale support is not buil
d into
PostgreSQL> by
+ Locale support is not buil
t into
PostgreSQL> by
default; to enable it, supply the
to the configure> script:
Occasionally it is useful to mix rules from several locales, e.g.,
- use U.S. rules but Spanish messages. To do that a set of
+ use U.S. collation rules but Spanish messages. To do that a set of
environment variables exist that override the default of
LANG> for a particular category:
- Once you have chosen a set of localization rules this way you must
- keep them fixed for any particular database cluster. That means
- that the locales that were active when you ran initdb>
- must be kept the same when you start the postmaster. Otherwise,
- the changed sort order can corrupt indexes or make your data
- disappear mysteriously. It is currently not possible to change the
- locales after database initialization or to use more than one set
- of locales for a given database cluster.
+ Note that the locale behavior is determined by the environment
+ variables seen by the server, not by the environment of any client.
+ Therefore, be careful to set these variables before starting the
+ postmaster.
+
+
+ The LC_COLLATE> and LC_CTYPE> variables affect the
+ sort order of indexes. Therefore, these values must be kept fixed
+ for any particular database cluster, or indexes on text columns will
+ become corrupt.
Postgres enforces this
+ by recording the values of LC_COLLATE> and LC_CTYPE>
+ that are seen by initdb>. The server automatically adopts
+ those two values when it is started; only the other LC_>
+ categories can be set from the environment at server startup.
+ In short, only one collation order can be used in a database cluster,
+ and it is chosen at initdb> time.
The only severe drawback of using the locale support in
PostgreSQL> is its speed. So use locale only if you
- actually need it.
+ actually need it. It should be noted in particular that selecting
+ a non-C locale disables index optimizations for LIKE> and
+ ~> operators, which can make a huge difference in the
+ speed of searches that use those operators.
MB also fixes some problems concerning 8-bit single byte
- character sets including ISO8859. (I would not say all of problems
+ character sets including ISO8859. (I would not say all problems
have been fixed. I just confirmed that the regression test ran fine
and a few French characters could be used with the patch. Please let
me know if you find any problem while using 8-bit characters.)
Enabling MB
- Run configure with a multibyte option:
+ Run configure with the multibyte option:
% ./configure --enable-multibyte[=encoding_system]
% initdb -E EUC_JP
- sets the default encoding to EUC_JP(Extended Unix Code for Japanese).
+ sets the default encoding to EUC_JP (Extended Unix Code for Japanese).
Note that you can use "--encoding" instead of "-E" if you prefer
to type longer option strings.
If no -E or --encoding option is given, the encoding
- specified at the compile time is used.
+ specified at configure time is used.
% createdb -E EUC_KR korean
- will create a database named "korean" with EUC_KR encoding. The
- another way to accomplish this is to use a SQL command:
+ will create a database named "korean" with EUC_KR encoding.
+ Another way to accomplish this is to use a SQL command:
CREATE DATABASE korean WITH ENCODING = 'EUC_KR';
-
- Using PGCLIENTENCODING.
-
- If an environment variable PGCLIENTENCODING is defined in the
- frontend, an automatic encoding translation is done by the backend.
-
-
-
Using SET CLIENT_ENCODING TO.
- Setting the frontend side encoding can be done a SQL command:
+ Setting the frontend side encoding can be done by this SQL command:
SET CLIENT_ENCODING TO 'encoding';
SET NAMES 'encoding';
- To query the current the frontend encoding:
+ To query the current frontend encoding:
SHOW CLIENT_ENCODING;
+
+
+ Using PGCLIENTENCODING.
+
+ If environment variable PGCLIENTENCODING is defined
+ in the client's environment, that client encoding is automatically
+ selected when a backend connection is made. (This can subsequently
+ be overridden using any of the other methods mentioned above.)
+
+
Suppose you choose EUC_JP for the backend, LATIN1 for the frontend,
then some Japanese characters could not be translated into LATIN1. In
- this case, a letter cannot be represented in the LATIN1 character set,
+ this case, a letter that cannot be represented in the LATIN1 character set
would be transformed as:
References
- These are good sources to start learning various kind of encoding
+ These are good sources to start learning about various kinds of encoding
systems.