+
+
unaccent
- unaccent> removes accents (diacritic signs) from a lexeme.
- It's a filtering dictionary, that means its output is
- always passed to the next dictionary (if any), contrary to the standard
- behavior. Currently, it supports most important accents from European
- languages.
+ unaccent> is a text search dictionary that removes accents
+ (diacritic signs) from lexemes.
+ It's a filtering dictionary, which means its output is
+ always passed to the next dictionary (if any), unlike the normal
+ behavior of dictionaries. This allows accent-insensitive processing
+ for full text search.
- Limitation: Current implementation of unaccent>
- dictionary cannot be used as a normalizing dictionary for
- thesaurus dictionary.
+ The current implementation of unaccent> cannot be used as a
+ normalizing dictionary for the thesaurus dictionary.
-
+
Configuration
- A unaccent> dictionary accepts the following options:
+ An unaccent> dictionary accepts the following options:
- Each line represents pair: character_with_accent character_without_accent
+ Each line represents a pair, consisting of a character with accent
+ followed by a character without accent. The first is translated into
+ the second. For example,
À A
Á A
-Â A
+Â A
à A
-Ä A
-Å A
-Æ A
+Ä A
+Å A
+Æ A
- Look at unaccent.rules>, which is installed in
- $SHAREDIR/tsearch_data/>, for an example.
+ A more complete example, which is directly useful for most European
+ languages, can be found in unaccent.rules>, which is installed
+ in $SHAREDIR/tsearch_data/> when the unaccent>
+ module is installed.
Usage
- Running the installation script creates a text search template
- unaccent> and a dictionary unaccent>
+ Running the installation script unaccent.sql> creates a text
+ search template unaccent> and a dictionary unaccent>
based on it, with default parameters. You can alter the
parameters, for example
-=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules');
+mydb=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules');
or create new dictionaries based on the template.
- To test the dictionary, you can try
-
+ To test the dictionary, you can try:
-=# select ts_lexize('unaccent','Hôtel');
- ts_lexize
+mydb=# select ts_lexize('unaccent','Hôtel');
+ ts_lexize
-----------
{Hotel}
(1 row)
-
+
- Filtering dictionary are useful for correct work of
- ts_headline function.
+ Here is an example showing how to insert the
+ unaccent> dictionary into a text search configuration:
-=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french );
-=# ALTER TEXT SEARCH CONFIGURATION fr
+mydb=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french );
+mydb=# ALTER TEXT SEARCH CONFIGURATION fr
ALTER MAPPING FOR hword, hword_part, word
WITH unaccent, french_stem;
-=# select to_tsvector('fr','Hôtels de la Mer');
- to_tsvector
+mydb=# select to_tsvector('fr','Hôtels de la Mer');
+ to_tsvector
-------------------
'hotel':1 'mer':4
(1 row)
-=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels');
- ?column?
+mydb=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels');
+ ?column?
----------
t
(1 row)
-=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels'));
- ts_headline
+
+mydb=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels'));
+ ts_headline
------------------------
- <b>Hôtel</b>de la Mer
+ <b>Hôtel</b> de la Mer
(1 row)
-
-
Function
+
Functions
- unaccent> function removes accents (diacritic signs) from
- argument string. Basically, it's a wrapper around
- unaccent> dictionary.
+ The unaccent()> function removes accents (diacritic signs) from
+ a given string. Basically, it's a wrapper around the
+ unaccent> dictionary, but it can be used outside normal
+ text search contexts.
-unaccent(dictionary, string)
-returns text
+unaccent(dictionary, string) returns text
+ For example:
-SELECT unaccent('unaccent', 'Hôtel');
-SELECT unaccent('Hôtel');
+SELECT unaccent('unaccent', 'Hôtel');
+SELECT unaccent('Hôtel');