+++ /dev/null
-
- PostgreSQL Charsets README
- Josef Balatka,
- Draft v0.1, Tue Jul 20 15:49:07 CEST 1999
-
- This document is a brief overview of the national charsets support
- that PostgreSQL ver. 6.5 has implemented. Various compilation options
- and setup tips are mentioned here to be helpful in the particular use.
-
- ---------------------------------------------------------------------------
-
- Table of Contents
-
- 1. Locale awareness
-
- 2. Single-byte charsets recoding
-
- 3. Multi-byte support/recoding
-
- 4. Credits
-
- ---------------------------------------------------------------------------
-
- 1. Locale awareness
-
- PostgreSQL server supports both locale aware and locale not aware
- (default) operational modes. You can determine this mode during the
- configuration stage of the installation with --enable-locale option.
-
- If you don't use --enable-locale, the multi-language code will not be
- compiled and PostgreSQL will behave as an ASCII compliant application.
- This mode is useful for its speed but only provided that you don't
- have to consider national specific chars.
-
- With --enable-locale you will get a locale aware server using LC_*
- environment variables to determine how to process national specifics.
- In this case strcoll(3) and similar functions are used internally
- so speed is somewhat lower.
-
- Notice here that --enable-locale is sufficient when all your clients
- use the same single-byte encoding as the database server does.
-
- When your clients use encoding different from the server than you have
- to use, moreover, --enable-recode or --with-mb= options on
- the server side or a particular client that does recoding itself (e.g.
- there exists a PostgreSQL ODBC driver for Win32 with various Cyrillic
- encoding capability). Option --with-mb= is necessary for the
- multi-byte charsets support.
-
-
- 2. Single-byte charsets recoding
-
- You can set up this feature with --enable-recode option. This option
- is described as 'enable Cyrillic recode support' which doesn't express
- all its power. It can be used for *any* single-byte charset recoding.
-
- This method uses charset.conf file located in the $PGDATA directory.
- It's a typical configuration text file where spaces and newlines
- separate items and records and # specifies comments. Three keywords
- with the following syntax are recognized here:
-
- BaseCharset
- RecodeTable
- HostCharset
-
- BaseCharset defines encoding of the database server. All charset
- names are only used for mapping inside the charset.conf so you can
- freely use typing-friendly names.
-
- RecodeTable records specify translation table between server and client.
- The file name is relative to the $PGDATA directory. Table file format
- is very simple. There are no keywords and characters are represented by
- a pair of decimal or hexadecimal (0x prefixed) values on single lines:
-
-
-
- HostCharset records define IP address and charset. You can use a single
- IP address, an IP mask range starting from the given address or an IP
- interval (e.g. 127.0.0.1, 192.168.1.100/24, 192.168.1.20-192.168.1.40)
-
- The charset.conf is always processed up to the end, so you can easily
- specify exceptions from the previous rules. In the src/data you will
- find charset.conf example and a few recoding tables.
-
- As this solution is based on the client's IP address / charset mapping
- there are obviously some restrictions as well. You can't use different
- encoding on the same host at the same time. It's also inconvenient when
- you boot your client hosts into more operating systems.
- Nevertheless, when these restrictions are not limiting and you don't
- need multi-byte chars than it's a simple and effective solution.
-
-
- 3. Multi-byte support/recoding
-
- It's a new generation of charset encoding in PostgreSQL designed as a
- more complex solution supporting both single-byte and multi-byte chars.
- You can set up this feature with --with-mb= option.
-
- There is no IP mapping file and recoding is controlled through the new
- SQL statements. Recoding tables are included in the code. Many national
- charsets are already supported and further will follow.
-
- See doc/README.mb, doc/README.mb.jp to get detailed instruction on how
- to use the multibyte support. In the file doc/README.locale there is
- a particular instruction on usage of the multibyte support with Cyrillic.
-
-
- 4. Credits
-
- I'd like to thank the PostgreSQL development team and all contributors
- for creating PostgreSQL. Thanks to Oleg Bartunov, Oleg Broytmann and
- Tatsuo Ishii for opening the door into the multi-language world.
-
+++ /dev/null
-===========
-1999 Jul 21
-===========
-
- Josef Balatka, asked us not to remove RECODE and sent me
-Czech ISO-8859-2 -> WIN-1250 translation table.
- RECODE is no longer contains just Cyrillic RECODE and will stay in
-PostgreSQL.
-
- He also created some bits of documentation, mostly concerning RECODE -
-see README.Charsets.
-
-
-===========
-1999 Apr 14
-===========
-
- Tatsuo Ishii updated Multibyte support extending it
-to Cyrillic language. Now PostgreSQL supports KOI8-R, WIN-1251, ISO8859-5
-and CP866 (ALT) encodings.
-
- Short instruction on using this feature follows. Longer discussion of
-Multibyte support is in README.mb.
-
- WARNING! Now with Multibyte support Cyrillic RECODE declared obsolete
-and will be removed from Postgres. If you are using RECODE consider
-switching to Multibyte support.
-
- Instructions on how to prepare Postgres for Cyrillic Multibyte support.
- ----------------------------------------------------------------------
-
- First, you need to backup all your databases. I recommend to backup the
-entire Postgres directory, including binaries and libraries - thus you can
-easily restore if something goes wrong.
-
- Dump you data: pg_dumpall > dump.db
-
- Stop postmaster.
-
- Configure, compile and install Postgres. (I'll mostly talk about KOI8-R
-encoding, this is just to make examples a little more clear; you can use
-any supported encoding.)
-
- cd src
- ./configure --enable-locale --with-mb=KOI8
- make
- make install
-
- Make sure you've backed up your databases. Doublecheck your backup. I
-really mean it - make regular backups and test your backups sometimes by
-fake restore.
-
- Remove your data directory (better, rename or move it).
-
- Run initdb saying your primary encoding: initdb -e KOI8. If you omit
-encoding, primary encoding from configure will be taken.
-
- Start postmaster.
-
- Create databases: createdb -e KOI8. Again, you can omit encoding -
-default encoding will be used. You are not forced to use the same encoding
-for all your databases - you can create different databases with different
-encodings.
-
- Load your data from the dump you've created: psql < dump.db
-
- That's all! Now you are ready to enjoy the full power of Multibyte
-support.
-
- To use Multibyte support you do not need to do something special - just
-execute your queries. If client program does not set encoding, it will get
-the data in database encoding. But client may ask Postgres to do automatic
-server-to-client and client-to-server conversions. There are 2 (two) ways
-client program declares its encoding:
- 1) client explicitly executes the query SET CLIENT_ENCODING TO 'win';
- 2) client started with environment variable set. Examples -
-using sh syntax:
- PGCLIENTENCODING='win'; export PGCLIENTENCODING
-using csh syntax:
- setenv PGCLIENTENCODING 'win'
-
- Setting PGCLIENTENCODING even if you use same client encding as the
-database would omit an overhead of asking the database encoding while
-initiating the connection, so it is good idea to set it in any case.
-
- Now you may run test suite and see Multibyte support in action. Go to
-.../src/test/locale and run
- make clean all test-koi2win
-
-
-===========
-1998 Nov 20
-===========
-
- I extended locale support, originally written by Oleg Bartunov
-
. Now ORDER BY (if PostgreSQL configured with
---enable-locale) uses strcoll() for all text fields: char(n), varchar(n),
-text.
-
- I included test suite .../src/test/locale. I didn't include this in
-the regression test because not so much people require locale support. Read
-.../src/test/locale/README for details on the test suite.
-
-
-Oleg.
- Describes the available language and character set support in
-
-
+
+
Localization>
+
+ Describes the available localization features from the point of
+ view of the administrator.
+
+
-
Postgres supports
non-ASCII character
- sets with two approaches:
+
Postgres supports
localization with
+ three approaches:
- Using locale features in underlying
- system libraries. This allows single-byte character sets to be
- configured with a locale-specific collation order, provided that
- the underlying system supports the required locale. This
- technique supports only one character set per server, and can
- not support multi-byte character sets.
+ Using the locale features of the operating system to provide
+ locale-specific collation order, number formatting, and other
+ aspects.
Using explicit multiple-byte character sets defined in the
-
Postgres server. These character sets
- are also known to some client libraries. The number of character
- sets is fixed at the time the server is compiled, and internal
- operations such as string comparisons require expansion of each
- character into a 32-bit word.
+
Postgres server to support languages
+ that require more characters than will fit into a single byte,
+ and to provide character set recoding between client and server.
+ The number of supported character sets is fixed at the time the
+ server is compiled, and internal operations such as string
+ comparisons require expansion of each character into a 32-bit
+ word.
+
+
+
+
+ Single byte character recoding provides a more light-weight
+ solution for users of multiple, yet single-byte character sets.
+
+
+
Locale Support
+
+ Locale> support refers to an application respecting
+ cultural preferences regarding alphabets, sorting, number
+ formatting, etc.
PostgreSQL> uses the standard ISO
+ C and POSIX-like locale facilities provided by the server operating
+ system. For additional information refer the documentation of your
+ system.
+
+
+
+
Overview>
+
+ Locale support is not build into
PostgreSQL> by
+ default; to enable it, supply the
+ to the configure> script:
+
+
+
$ >./configure --enable-locale>
+
+
+ Locale support only affects the server; all clients are compatible
+ with servers with or without locale support.
+
+
+ The information about which particular cultural rules to use is
+ determined by standard environment variables. If you are getting
+ localized behavior from other programs you probably have them set
+ up already. The simplest way to set the localization information
+ is the LANG> variable, for example:
+export LANG=sv_SE
+
+ This sets the locale to Swedish (sv>) as spoken in
+ Sweden (SE>). Other possibilities might be
+ en_US> (U.S. English) and fr_CA> (Canada,
+ French). If more than one character set can be useful for a locale
+ then the specifications look like this:
+ cs_CZ.ISO8859-2>. What locales are available under what
+ names on your system depends on what was provided by the operating
+ system vendor and what was installed.
+
+
+ Occasionally it is useful to mix rules from several locales, e.g.,
+ use U.S. rules but Spanish messages. To do that a set of
+ environment variables exist that override the default of
+ LANG> for a particular category:
+
+
+
+
+ |
+ LC_COLLATE>
+ String sort order>
+
+ |
+ LC_CTYPE>
+ Character classification (What is a letter? What is the upper-case equivalent of this letter?)>
+
+ |
+ LC_MESSAGES>
+ Language of messages>
+
+ |
+ LC_MONETARY>
+ Formatting of currency amounts>
+
+ |
+ LC_NUMERIC>
+ Formatting of numbers>
+
+ |
+ LC_TIME>
+ Formatting of dates and times>
+
+
+
+
+
+ LC_MESSAGES> only affects the messages that come from the
+ operating system, not
PostgreSQL>.
+
+
+ If you want the system to behave as if it had no locale support,
+ use the special locale C> or POSIX>, or
+ simply unset all locale related variables.
+
+
+ Once you have chosen a set of localization rules this way you must
+ keep them fixed for any particular database cluster. That means
+ that the locales that were active when you ran initdb>
+ must be kept the same when you start the postmaster. Otherwise,
+ the changed sort order can corrupt indexes or make your data
+ disappear mysteriously. It is currently not possible to change the
+ locales after database initialization or to use more than one set
+ of locales for a given database cluster.
+
+
+
+
+
Benefits>
+
+ Locale support influences in particular the following features:
+
+
+
+ Sort order in ORDER BY> queries.
+
+
+
+
+ The to_char> family of functions
+
+
+
+
+ The LIKE> and ~> operators for pattern
+ matching
+
+
+
+
+
+ The only severe drawback of using the locale support in
+
PostgreSQL> is its speed. So use locale only if you
+ actually need it.
+
+
+
+
+
Problems>
+
+ If locale support doesn't work in spite of the explanation above,
+ check that the locale support in your operating system is okay.
+ To check whether a given locale is installed and functional you
+ can use
Perl>, for example. Perl has also support
+ for locales and if a locale is broken perl -v> will
+ complain something like this:
+
+
$> export LC_CTYPE='not_exist'>
+
+perl: warning: Setting locale failed.
+perl: warning: Please check that your locale settings:
+LC_ALL = (unset),
+LC_CTYPE = "not_exist",
+LANG = (unset)
+are supported and installed on your system.
+perl: warning: Falling back to the standard locale ("C").
+
+
+
+
+ Check that your locale files are in the right location. Possible
+ locations include: /usr/lib/locale (Linux,
+ Solaris), /usr/share/locale (Linux),
+ /usr/lib/nls/loc (DUX 4.0). Check the locale
+ man page of your system if you are not sure.
+
+
+ The directory src/test/locale> contains a test suite
+ for
PostgreSQL>'s locale support.
+
+
+
+
+
-
Multi-byte Support
+
Multibyte Support
Author
- Multi
-byte (
MB) support is intended to allow
+ Multibyte (
MB) support is intended to allow
multiple-byte character sets such as EUC (Extended Unix Code), Unicode and
Mule internal code. With
MB enabled you can use multi-byte
-
+
+
+
+
Single-byte character set recoding>
+
+
+ You can set up this feature with the
+ to configure>. This option was formerly described as
+ Cyrillic recode support> which doesn't express all its
+ power. It can be used for any> single-byte character
+ set recoding.
+
+
+ This method uses a file charset.conf> file located in
+ the database directory (PGDATA>). It's a typical
+ configuration text file where spaces and newlines separate items
+ and records and # specifies comments. Three keywords with the
+ following syntax are recognized here:
+
+BaseCharset server_charset>
+RecodeTable from_charset> to_charset> file_name>
+HostCharset host_spec> host_charset>
+
+
+
+ BaseCharset> defines the encoding of the database server.
+ All character set names are only used for mapping inside of
+ charset.conf> so you can freely use typing-friendly
+ names.
+
+
+ RecodeTable> records specify translation tables between
+ server and client. The file name is relative to the
+ PGDATA> directory. The table file format is very
+ simple. There are no keywords and characters are represented by a
+ pair of decimal or hexadecimal (0x prefixed) values on single
+ lines:
+
+char_value> translated_char_value>
+
+
+
+ HostCharset> records define the client character set by IP
+ address. You can use a single IP address, an IP mask range starting
+ from the given address or an IP interval (e.g., 127.0.0.1,
+ 192.168.1.100/24, 192.168.1.20-192.168.1.40).
+
+
+ The charset.conf> file is always processed up to the
+ end, so you can easily specify exceptions from the previous
+ rules. In the src/data you will find charset.conf example and a few
+ recoding tables.
+
+
+ As this solution is based on the client's IP address and character
+ set mapping there are obviously some restrictions as well. You
+ cannot use different encodings on the same host at the same
+ time. It is also inconvenient when you boot your client hosts into
+ more operating systems. Nevertheless, when these restrictions are
+ not limiting and you do not need multi-byte characters than it is a
+ simple and effective solution.
+
+
+
+
+
PostgreSQL> ]]>Installation Instructions
--enable-recode
- Enables character set recode support. See
- doc/README.Charsets> for details on this feature.
+ Enables single-byte character set recode support. See
+ Administrator's Guide]]>
+ ]]> about this feature.
Allows the use of multibyte character encodings. This is
primarily for languages like Japanese, Korean, and Chinese.
- Read doc/README.mb> for details.
+ Read
+ Administrator's Guide]]>
+ ]]>
+ for details.