-
+
Full Text Search
Migration from Pre-8.3 Text Search
- This area needs lots of work. Here is a quick list of known issues:
+ Applications that used the contrib/tsearch2> add-on module
+ for text searching will need some adjustments to work with the
+ built-in features:
- mark="bullet">
+
- The old contrib/tsearch2 objects must> be removed from
- the pg_dump output from a pre-8.3 database. While many of them won't
- load for lack of a tsearch2.so library, some do and cause problems.
- We have a working perl script for doing this with a custom- or tar-format
- backup, but there is a proposal to incorporate the functionality directly
- into pg_restore. Neither approach will help for pg_dumpall output.
+ Some functions have been renamed or had small adjustments in their
+ argument lists, and all of them are now in the pg_catalog>
+ schema, whereas in a previous installation they would have been in
+ public> or another non-system schema. There is a new
+ version of contrib/tsearch2> (see )
+ that provides a compatibility layer to solve most problems in this
+ area.
- The old dump may include schema-qualified references to the old
- contrib/tsearch2 objects; for example public.tsvector>
- columns in table definitions. These will fail since the objects
- are now in the pg_catalog schema. Given current pg_dump behavior
- this will happen only for tables that are in a different schema
- from the tsearch2 objects; which makes it more likely to bite
- people who carefully put their tsearch2 objects in a
- non-public> schema.
-
-
- Question: will restore-time failures of this type happen for
- any objects other than the tsvector and tsquery datatypes?
-
-
- The basic alternatives for fixing this seem to involve creating
- a dummy linkage, such as a public.tsvector domain linking to the
- base pg_catalog.tsvector type (which only helps for the datatypes);
- or stripping the schema references out of the dump. We could
- just recommend that users do this manually, or try to provide
- some tools to help.
-
-
-
-
- We have renamed the built-in tsvector update triggers, and changed
- their arguments too. This will result in CREATE TRIGGER commands
- failing during load, which can be ignored, but users will need to
- re-issue them with suitable argument adjustment. We probably
- can't automate that for them. Also, the old tsearch2 trigger
- function offered an option to invoke functions, which was removed
- as being a security hole. Users who were relying on that will need to
- write custom trigger functions as a substitute. I think all we
- can do here is document what to do to fix it.
+ The old contrib/tsearch2> functions and other objects
+
must> be suppressed when loading pg_dump>
+ output from a pre-8.3 database. While many of them won't load anyway,
+ a few will and then cause problems. One simple way to deal with this
+ is to load the new contrib/tsearch2> module before restoring
+ the dump; then it will block the old objects from being loaded.
- We have renamed a number of other functions besides the triggers,
- compared to the tsearch2 versions. This seems unlikely to cause
- any problems during dump/reload but it will require adjustments in
- the bodies of stored procedures and in client application code.
- Again, not much to do except document it.
+ Text search configuration setup is completely different now.
+ Instead of manually inserting rows into configuration tables,
+ search is configured through the specialized SQL commands shown
+ earlier in this chapter. There is not currently any automated
+ support for converting an existing custom configuration for 8.3;
+ you're on your own here.
- Configuration setup is completely different now. Can we provide
- any automated assistance for translating an old custom setup?
- It probably can't be 100% automatic in any case, so maybe documentation
- is the best we can do here too. Aside from the inside-the-database
- differences, outside-the-database configuration files now have
- prescribed location and extensions, which was not true before.
-
-
+ Most types of dictionaries rely on some outside-the-database
+ configuration files. These are largely compatible with pre-8.3
+ usage, but note the following differences:
-
- Relocation of configuration from add-on tables into core system catalogs
- will break client queries that looked at the add-on tables.
-
-
+
+
+ Configuration files now must be placed in a single specified
+ directory ($SHAREDIR/tsearch_data>), and must have
+ a specific extension depending on the type of file, as noted
+ previously in the descriptions of the various dictionary types.
+ This restriction was added to forestall security problems.
+
+
-
- Thesaurus files now use ?> for stop words.
-
-
+
+ Configuration files must be encoded in UTF-8 encoding,
+ regardless of what database encoding is used.
+
+
-
- What else?
+
+ In thesaurus configuration files, stop words must be marked with
+ ?>.
+
+
+