SlideShare a Scribd company logo
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
PostgreSQL Indexing
Dublin, 2013
Hans-Jürgen Schönig
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
Scope of this session:
- What a basic index does
- The PostgreSQL optimizer (cost model)
- Classical B-tree Indexes
- Partial / functional indexes
- Different types of indexes
- Full-Text-Search
- Fuzzy matching
- Writing your own indexing strategy
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Generating test data:
- for the purpose of this session we need a
table consisting of two columns:
test=# CREATE TABLE t_test (id serial, name text);
CREATE TABLE
test=# INSERT INTO t_test (name) VALUES ('hans');
INSERT 0 1
test=# INSERT INTO t_test (name) VALUES ('paul');
INSERT 0 1
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- A lot more test data ...
- Let us create some more test data
by repeating the process
test=# INSERT INTO t_test (name) SELECT name FROM t_test;
INSERT 0 2
...
test=# INSERT INTO t_test (name) SELECT name FROM t_test;
INSERT 0 2097152
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- A lot more test data ...
- Let us create some more test data
by repeating the process
test=# INSERT INTO t_test (name) SELECT name FROM t_test;
INSERT 0 2
...
test=# INSERT INTO t_test (name) SELECT name FROM t_test;
INSERT 0 2097152
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Reading some data:
- Let us see, how PostgreSQL executes a simple query:
test=# SELECT count(*) FROM t_test;
count
---------
4194304
(1 row)
Time: 431.192 ms
test=# explain analyze SELECT count(*) FROM t_test;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=75100.80..75100.81 rows=1 width=0) (actual time=977.865..977.865 rows=1 loops=1)
-> Seq Scan on t_test (cost=0.00..64615.04 rows=4194304 width=0)
(actual time=0.013..531.448 rows=4194304 loops=1)
Total runtime: 977.917 ms
(3 rows)
Time: 1045.065 ms
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Reading some data:
- Let us add a filter:
test=# SELECT count(*) FROM t_test WHERE id = 421234;
count
-------
1
(1 row)
Time: 476.965 ms
test=# explain analyze SELECT count(*) FROM t_test WHERE id = 421234;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Aggregate (cost=75100.80..75100.81 rows=1 width=0) (actual time=495.134..495.135 rows=1 loops=1)
-> Seq Scan on t_test (cost=0.00..75100.80 rows=1 width=0)
(actual time=53.405..495.126 rows=1 loops=1)
Filter: (id = 421234)
Rows Removed by Filter: 4194303
Total runtime: 495.175 ms
(5 rows)
Time: 520.659 ms
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Sequentially reading data:
- In case you like reading the phone book sequentially
we are basically done.
- Sequentially reading the phone book is technically ok
=> but socially not accepted
- Defining an index is the desired solution
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Creating an index
test=# h CREATE INDEX
Command: CREATE INDEX
Description: define a new index
Syntax:
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ]
ON table_name [ USING method ]
( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ]
[ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ WITH ( storage_parameter = value [, ... ] ) ]
[ TABLESPACE tablespace_name ]
[ WHERE predicate ]
- At the end of the day all clauses will be
covered by this training
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- A typical index:
test=# CREATE INDEX idx_id ON t_test (id);
CREATE INDEX
Time: 7357.663 ms
- This gives us a standard btree index
- PostgreSQL provides “High-Concurrency B-Trees”
(Lehman-Yao, 1981)
- Many people can modify the index at the same time
- Highly efficient B+ tree
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- How a btree works:
8k
Root Node
...
Sorted
...
Forward chaining
Tabelle
Index
8k ...
Row
linp
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Indexing is beneficial
test=# explain analyze SELECT count(*)
FROM t_test
WHERE id = 421234;
QUERY PLAN
------------------------------------------------------------------------------
Aggregate (cost=8.73..8.74 rows=1 width=0)
(actual time=0.024..0.024 rows=1 loops=1)
-> Index Only Scan using idx_id on t_test (cost=0.00..8.73 rows=1 width=0)
(actual time=0.019..0.020 rows=1 loops=1)
Index Cond: (id = 421234)
Heap Fetches: 1
Total runtime: 0.057 ms
(5 rows)
Time: 0.395 ms
- A lot faster :).
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Still slow ...
test=# SELECT count(*) FROM t_test WHERE name = 'hans';
count
---------
2097152
(1 row)
Time: 787.407 ms
- This is still slow. Let us create an index ...
test=# CREATE INDEX idx_name ON t_test (name);
CREATE INDEX
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- The benefit is exactly zero:
test=# SELECT count(*) FROM t_test WHERE name = 'hans';
count
---------
2097152
(1 row)
Time: 782.443 ms
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans';
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=80350.32..80350.33 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0)
Filter: (name = 'hans'::text)
(3 rows)
- The index won't be used
- Too many identical values (“not selective”)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- The cost is far from zero:
test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
pg_size_pretty
----------------
177 MB
(1 row)
test=# SELECT pg_size_pretty(pg_relation_size('idx_id'));
pg_size_pretty
----------------
90 MB
(1 row)
test=# SELECT pg_size_pretty(pg_relation_size('idx_name'));
pg_size_pretty
----------------
90 MB
(1 row)
- Indexes need a fair amount of space
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Input values DO make a difference:
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans';
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=80350.32..80350.33 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0)
Filter: (name = 'hans'::text)
(3 rows)
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans2';
QUERY PLAN
----------------------------------------------------------------------------------
Aggregate (cost=7.74..7.75 rows=1 width=0)
-> Index Only Scan using idx_name on t_test (cost=0.00..7.74 rows=1 width=0)
Index Cond: (name = 'hans2'::text)
(3 rows)
- PostgreSQL will decide depending on the input value
=> cost based optimization
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Partial indexes:
- In our example the index is only used in case
of rare or non-existing values
- What is the point of an index when its entire
content is totally useless?
=> a more selective strategy is needed
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Partial indexes:
test=# DROP INDEX idx_name;
DROP INDEX
test=# CREATE INDEX idx_name ON t_test (name)
WHERE name NOT IN ('hans', 'paul');
CREATE INDEX
test=# SELECT pg_size_pretty(pg_relation_size('idx_name'));
pg_size_pretty
----------------
8192 bytes
(1 row)
- A partial index reduces space consumption
- Benefit is still the same
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Equal benefit – lower cost:
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans';
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=80350.32..80350.33 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0)
Filter: (name = 'hans'::text)
(3 rows)
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans2';
QUERY PLAN
----------------------------------------------------------------------------------
Aggregate (cost=7.28..7.29 rows=1 width=0)
-> Index Only Scan using idx_name on t_test (cost=0.00..7.28 rows=1 width=0)
Index Cond: (name = 'hans2'::text)
(3 rows)
- This is exactly the same as before !
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- What about functions?
test=# CREATE INDEX idx_cos ON t_test ( cos(id) );
CREATE INDEX
Time: 16867.228 ms
test=# explain SELECT count(*) FROM t_test WHERE cos(id) = 17;
QUERY PLAN
----------------------------------------------------------------------------------
Aggregate (cost=23960.99..23961.00 rows=1 width=0)
-> Bitmap Heap Scan on t_test (cost=395.25..23908.56 rows=20972 width=0)
Recheck Cond: (cos((id)::double precision) = 17::double precision)
-> Bitmap Index Scan on idx_cos (cost=0.00..390.01 rows=20972 width=0)
Index Cond: (cos((id)::double precision) = 17::double precision)
(5 rows)
- PostgreSQL provides functional indexes
- VERY nice to avoid additional columns
- Gives a lot of extra flexibility
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Type of functions allowed
- Functions must be deterministic
=> “immutable”
=> Functions can be written in almost any language
=> This is highly performance sensitive
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- How does PostgreSQL decide on
index vs. no index?
- PostgreSQL uses statistics to estimate the number of
rows coming back
- Each operation will be assigned to costs
=> costs are just a number to compare
different options inside the planner
- Costs parameters can be changed at runtime
or globally
=> be careful, it can go against you
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- pg_stats is your friend:
test=# d pg_stats
View "pg_catalog.pg_stats"
Column | Type | Modifiers
-------------------------------+-----------+-----------
schemaname | name |
tablename | name |
attname | name |
inherited | boolean |
null_frac | real |
avg_width | integer |
n_distinct | real |
most_common_vals | anyarray |
most_common_freqs | real[] |
histogram_bounds | anyarray |
correlation | real |
most_common_elems | anyarray |
most_common_elem_freqs | real[] |
elem_count_histogram | real[] |
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Updating statistics
- System statistics are updated by ANALYZE:
test=# h ANALYZE
Command: ANALYZE
Description: collect statistics about a database
Syntax:
ANALYZE [ VERBOSE ] [ table_name [ ( column_name [, ...] ) ] ]
- In most setups autovacuum is in charge
of updating pg_statistic
- In most cases statistics are not an issue
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- How does PostgreSQL estimate costs?
- seq_page_cost = 1
- random_page_cost = 4
- cpu_tuple_cost = 0.01
- cpu_operator_cost = 0.0025
- cpu_index_tuple_cost = 0.005
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Let us do the math (1):
test=# explain SELECT count(*) FROM t_test;
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=75100.80..75100.81 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..64615.04 rows=4194304 width=0)
(2 rows)
- total costs are at 75100.81
- costs are composed of I/O and CPU costs
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Let us do the math (2):
test=# SELECT pg_relation_size('t_test') / 8192;
?column?
----------
22672
(1 row)
- our table consists of 22672 blocks
- each block is 8kb in size
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Let us do the math (3):
The seq scan:
I/O cost = 22672 * seq_page_cost = 22672
4.194.304 * cpu_tuple_cost = 41943.04
= 64615.04 for the seq scan
The aggregate:
4.194.304 * cpu_operator_cost = 10485.76
Total costs => 75.100.80 + cpu_operator_cost
(we have to display the tuple)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Inflation at work:
test=# SET seq_page_cost TO 10;
SET
test=# explain SELECT count(*) FROM t_test;
QUERY PLAN
-----------------------------------------------------------------------
Aggregate (cost=279148.80..279148.81 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..268663.04 rows=4194304 width=0)
(2 rows)
- Costs can be changed at runtime to fine tune
index usage
=> only do this if you are fully aware of what
you are doing. It can have unintended side
effects
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Spinning disks vs. SSDs
- Traditional disks are fast sequentially
and pretty bad when doing random
I/O
- SSDs fixed the problem.
=> consider changing random_page_cost
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Abusing tablespaces:
test=# ALTER TABLESPACE pg_default
SET (random_page_cost = 1);
ALTER TABLESPACE
- Allows different cost settings for various
disk subsystems
- It also allows to split “cached” and “uncached”
data -> ugly but useful
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Correlation and disk layout
test=# CREATE TABLE t_random AS SELECT *
FROM t_test
ORDER BY random();
SELECT 4194304
test=# CREATE INDEX idx_random ON t_random(id);
CREATE INDEX
test=# ANALYZE t_random;
ANALYZE
- The PostgreSQL optimizer considers the
physical order of rows on disk
- High-correlation will make indexes ways
more likely as the optimizer reduces its
estimates for I/O costs.
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Correlation and disk layout
test=# explain SELECT count(*) FROM t_test WHERE id < 1000;
QUERY PLAN
-------------------------------------------------------------------------------
Aggregate (cost=75.35..75.36 rows=1 width=0)
-> Index Only Scan using idx_id on t_test
(cost=0.00..72.72 rows=1049 width=0)
Index Cond: (id < 1000)
(3 rows)
test=# explain SELECT count(*) FROM t_random WHERE id < 1000;
QUERY PLAN
-------------------------------------------------------------------------------
Aggregate (cost=950.31..950.32 rows=1 width=0)
-> Index Only Scan using idx_random on t_random
(cost=0.00..947.94 rows=947 width=0)
Index Cond: (id < 1000)
(3 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Implications:
- This is why different plans can pop up
EVEN if the data is the same
- There is no fixed amount of data making
PostgreSQL switch from index to
sequential scan
- High correlation can improve performance
=> consider clustering the table
test=# h CLUSTER
Command: CLUSTER
Description: cluster a table according to an index
Syntax:
CLUSTER [VERBOSE] table_name [ USING index_name ]
CLUSTER [VERBOSE]
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Using OR / AND:
- PostgreSQL can use more than one index per
table per query
- PostgreSQL provides multi-column indexes
- What you might see is a so called “Bitmap Scan”
=> don't mix it up with Oracle Bitmap Indexes
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Bitmap scans:
test=# explain SELECT * FROM t_test WHERE id = 2343 OR id = 423423;
QUERY PLAN
---------------------------------------------------------------------------
Bitmap Heap Scan on t_test (cost=9.44..17.41 rows=2 width=9)
Recheck Cond: ((id = 2343) OR (id = 423423))
-> BitmapOr (cost=9.44..9.44 rows=2 width=0)
-> Bitmap Index Scan on idx_id (cost=0.00..4.72 rows=1 width=0)
Index Cond: (id = 2343)
-> Bitmap Index Scan on idx_id (cost=0.00..4.72 rows=1 width=0)
Index Cond: (id = 423423)
(7 rows)
- PostgreSQL will scan the index twice
- PostgreSQL will look for blocks in the underlying table
- The condition has to be re-evaluated
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Bitmap scans:
test=# explain SELECT * FROM t_test WHERE id = 2343 AND name = 'josef';
QUERY PLAN
-----------------------------------------------------------------------
Index Scan using idx_name on t_test (cost=0.00..8.27 rows=1 width=9)
Index Cond: (name = 'josef'::text)
Filter: (id = 2343)
(3 rows)
- PostgreSQL does not always use two indexes
when you have 2 quals
- The more selective index might be enough
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Multicolumn indexes:
test=# DROP INDEX idx_id;
DROP INDEX
test=# CREATE INDEX idx_combined ON t_test (id, name);
CREATE INDEX
test=# explain SELECT * FROM t_test WHERE id = 10;
QUERY PLAN
--------------------------------------------------------------------------------
Index Only Scan using idx_combined on t_test (cost=0.00..8.91 rows=1 width=9)
Index Cond: (id = 10)
(2 rows)
- PostgreSQL can use parts of those column IF they are
in the first part(s) of the index
- Imagine a phone book; it is just liked a combined index
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Many indexes or combined indexes?
- It depends on what you want to query
- If you always use the first conditions in the index
a combined index might be a good idea
- Many indexes are more flexible but maybe not perfect
- Sometimes a mixed-strategy can be useful
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
4. Indexes to provide order
- b-tress can be used for more than searching
- Binary trees provide you with order.
- Order helps to avoid repeated sorting.
test=# explain SELECT * FROM t_test ORDER BY id LIMIT 10;
QUERY PLAN
--------------------------------------------------------------------------------------
Limit (cost=0.00..0.31 rows=10 width=9)
-> Index Scan using idx_id on t_test (cost=0.00..131602.27 rows=4194304 width=9)
(2 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
5. Dealing with upper / lowercase
- Upper and lower case searches are common:
- If you want to do case-insensitive, don't use
a functional index
- Consider using “citext”
test=# CREATE EXTENSION citext;
CREATE EXTENSION
test=# SELECT 'ABC'::citext = 'abc'::citext;
?column?
----------
t
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
6. Different types of indexes
- PostgreSQL supports more than just btrees
- B-Trees are fine if you are interested in things
which can be sorted
- Try to sort polygons => you won't find them
- Geometric data and Full-Text-Search need
different algorithms
NOTE: This is not about, which index is faster.
This is about the correct ALGORITHM
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
6. Different types of indexes
- Index types provided by PostgreSQL
- B-Trees
- Gist: Generalized Search Tree
- Gin: Generalized Inverted Index
- Sp-Gist: Space Partitioned Gist
- Hash
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
6. Different types of indexes
- Indexes and algorithms
- B-Trees: numbers, text, dates, etc.
- Gist: Generalized Search Tree
- Gin: Generalized Inverted Index
- Sp-Gist: Space Partitioned Gist
- Hash
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. Gist indexes
- Gist operates on different principles
than btree
- it supports “contains”, “left of”, “overlaps”, etc.
- “contains”, etc. are good for
=> Full Text Search
=> Geometric operations (PostGIS, etc.)
=> Finding genome sequences
=> Handling ranges (time, etc.)
=> Fuzzy search
- Gist allows KNN-search
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. Gist indexes
- How it works internally ...
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. GIN indexes
- Gist is a so called inverted index
- Used for Full Text Search
- If you have 1 mio documents containing the word
“house”. Do you really want to have house inside
the index 1 mio times?
=> Binary tree for words
=> A document list for each word
=> Classical approach to text search
- FTS is not about “=”, it is about “contains”
=> forget btree
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. GIN indexes
- GIN internal workings:
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. SP-Gist indexes
- SP-Gist is a space partitioned index
- Can be used for a variety of algorithms, which use
space partitioning
=> quad trees
=> suffix trees
=> k-d trees
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. SP-Gist indexes
- Quad trees: A prototype example ...
- We want to insert ... (6, 4) and (2, 8)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Stemming:
- Before searching, it makes sense to perform
“stemming”
test=# SELECT to_tsvector('english', 'having many cars is better than
to have just one car');
to_tsvector
-----------------------------------------
'better':5 'car':3,11 'mani':2 'one':10
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Stemming is language dependent:
- Stemming works nicely for “roman” languages
=> it is hard to do this for chinese and so on
test=# SELECT to_tsvector('english', 'i am'),
to_tsvector('german', 'i am'),
to_tsvector('dutch', 'i am');
to_tsvector | to_tsvector | to_tsvector
-------------+-------------+--------------
| 'i':1 | 'am':2 'i':1
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- “contains” is your friend:
- ts_query compares a search string with a so called
ts_vector:
test=# SELECT to_tsvector('english', 'having many cars is better
than to have just one car')
@@ to_tsquery('english', 'car');
?column?
----------
t
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- “contains” is your friend:
- ts_query compares a search string with a so called
ts_vector:
test=# SELECT to_tsvector('english', 'having many cars is better
than to have just one car')
@@ to_tsquery('english', 'car');
?column?
----------
t
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Indexing is easy:
- All you need is a functional index
- Alternatively the stemmed content can be
“materialized” in a separate column
CREATE INDEX idx_fti ON t_test
USING gist (to_tsvector('german', name));
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- ts_vector and ts_query magic
- PostgreSQL allows you to use “and” (&)
and “or” (|)
test=# SELECT to_tsvector('english', 'having many cars is better than
to have just one car')
@@ to_tsquery('english', 'car & truck');
?column?
----------
f
(1 row)
test=# SELECT to_tsvector('english', 'having many cars is better than
to have just one car')
@@ to_tsquery('english', '(car | truck) & many');
?column?
----------
t
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- A stupid question: What is a “word”?
- PostgreSQL is NOT limited to textual search
- Remember, it is all about “contains” ...
- Create yourself your own parser:
test=# h CREATE TEXT SEARCH PARSER
Command: CREATE TEXT SEARCH PARSER
Description: define a new text search parser
Syntax:
CREATE TEXT SEARCH PARSER name (
START = start_function ,
GETTOKEN = gettoken_function ,
END = end_function ,
LEXTYPES = lextypes_function
[, HEADLINE = headline_function ]
)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Even more flexibility (2):
test=# h CREATE TEXT SEARCH CONFIGURATION
Command: CREATE TEXT SEARCH CONFIGURATION
Description: define a new text search configuration
Syntax:
CREATE TEXT SEARCH CONFIGURATION name (
PARSER = parser_name |
COPY = source_config
)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Even more flexibility:
test=# h CREATE TEXT SEARCH DICTIONARY
Command: CREATE TEXT SEARCH DICTIONARY
Description: define a new text search dictionary
Syntax:
CREATE TEXT SEARCH DICTIONARY name (
TEMPLATE = template
[, option = value [, ... ]]
)
test=# h CREATE TEXT SEARCH TEMPLATE
Command: CREATE TEXT SEARCH TEMPLATE
Description: define a new text search template
Syntax:
CREATE TEXT SEARCH TEMPLATE name (
[ INIT = init_function , ]
LEXIZE = lexize_function
)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- What does it take to organize a btree?
Operator Strategy number
< 1
<= 2
= 3
>= 4
< 5
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Why care?
- The way numbers are treated is pretty “common”
- How about sorting this one?
“2305 09 04 78”
“4353 07 06 77”
=> it seems the sort order is correct as shown
=> it isn't – it is an Austrian social security number
=> 1977 was before 1978 and not other way round
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Defining indexing strategies
- We can write our own operators
- Those operators can be assigned to an operator
class, which will tell the index how to “behave”
“2305 09 04 78”
“4353 07 06 77”
=> it seems the sort order is correct as shown
=> it isn't – it is an Austrian social security number
=> 1977 was before 1978 and not other way round
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Writing an operator (1):
test=# CREATE OR REPLACE FUNCTION normalize_si(text)
RETURNS text AS $$
BEGIN
RETURN substring($1, 9, 2) ||
substring($1, 7, 2) ||
substring($1, 5, 2) ||
substring($1, 1, 4);
END; $$
LANGUAGE 'plpgsql' IMMUTABLE;
CREATE FUNCTION
test=# SELECT normalize_si('2305090478');
normalize_si
--------------
7804092305
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Writing an operator (2):
test=# CREATE OR REPLACE FUNCTION si_lt(text, text)
RETURNS boolean AS
$$
BEGIN
RETURN normalize_si($1) < normalize_si($2);
END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;
test=# CREATE OPERATOR <# (
PROCEDURE=si_lt,
LEFTARG=text,
RIGHTARG=text);
CREATE OPERATOR
CREATE FUNCTION
test=# SELECT '2305090478'::text <# '4353070677'::text;
?column?
----------
f
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Creating the operator class:
- write operators for all operations needed
- write “support functions” (= “same”, etc.)
- make sure that the most important strategies
have proper operators
test=# h CREATE OPERATOR CLASS
Command: CREATE OPERATOR CLASS
Description: define a new operator class
Syntax:
CREATE OPERATOR CLASS name [ DEFAULT ] FOR TYPE data_type
USING index_method [ FAMILY family_name ] AS
{ OPERATOR strategy_number operator_name [ ( op_type, op_type ) ]
[ FOR SEARCH | FOR ORDER BY sort_family_name ]
| FUNCTION support_number [ ( op_type [ , op_type ] ) ]
function_name ( argument_type [, ...] )
| STORAGE storage_type
} [, ... ]
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
10. Available operator classes
- pg_trgm
- Trigrams are perfect to perform fuzzy matching
- Trigrams can be used nicely along with KNN-search
- pg_trgm is available as extension to PostgreSQL
test=# CREATE EXTENSION pg_trgm;
CREATE EXTENSION
- Problem: “What is the proper way to spell the name of this
village?
“gramatneusiedl” vs. “grammatneusiedel”?
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
10. Available operator classes
- Testing pg_trgm
test=# CREATE TABLE t_search AS
SELECT relname::text
FROM pg_class;
SELECT 303
test=# CREATE INDEX idx_trgm
ON t_search USING gist(relname gist_trgm_ops);
CREATE INDEX
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
10. Available operator classes
- Testing pg_trgm (2):
test=# SELECT *, 'pgclass' <-> relname
FROM t_search
ORDER BY 'pgclass' <-> relname
LIMIT 10;
relname | ?column?
--------------------------------+----------
pg_class | 0.454545
pg_opclass | 0.538462
pg_class_oid_index | 0.714286
pg_opclass_oid_index | 0.727273
pg_class_relname_nsp_index | 0.793103
pg_opclass_am_name_nsp_index | 0.8
pg_seclabel | 0.823529
pg_am | 0.833333
pg_seclabels | 0.833333
pg_shseclabel | 0.842105
(10 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
10. Available operator classes
- KNN in action:
test=# explain SELECT *, 'pgclass' <-> relname
FROM t_search
ORDER BY 'pgclass' <-> relname
LIMIT 10;
QUERY PLAN
-----------------------------------------------------------------------------------
Limit (cost=0.14..1.40 rows=10 width=19)
-> Index Scan using idx_trgm on t_search (cost=0.14..38.20 rows=303 width=19)
Order By: (relname <-> 'pgclass'::text)
(3 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
11. Traditional LIKE
- LIKE can be indexed in some cases:
- The PostgreSQL optimizer can rewrite queries featuring LIKE
in a fancy and efficient way
=> The goal is to find the “next character” in line
and query for a range
- This kind of rewrite only works when the next character
Is actually knows to PostgreSQL
- Special operator classes might be needed
=> varchar_pattern_ops, text_pattern_ops
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
11. Traditional LIKE
- An example:
test=# CREATE INDEX idx_relname
ON t_search (relname);
CREATE INDEX
test=# SET enable_seqscan TO off;
SET
test=# explain SELECT relname
FROM t_search
WHERE relname LIKE 'abc%';
QUERY PLAN
----------------------------------------------------------------------------------
Index Only Scan using idx_relname on t_search (cost=0.27..8.29 rows=1 width=19)
Index Cond: ((relname >= 'abc'::text) AND (relname < 'abd'::text))
Filter: (relname ~~ 'abc%'::text)
(3 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
12. Indexing MIN / MAX
- An example:
- MIN / MAX works by reading the index from left
and right (backward scan)
test=# explain SELECT min(relname), max(relname) FROM t_search;
QUERY PLAN
----------------------------------------------------------------------------------
Result (cost=0.74..0.75 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.27..0.37 rows=1 width=19)
-> Index Only Scan using idx_relname on t_search
(cost=0.27..29.57 rows=303 width=19)
Index Cond: (relname IS NOT NULL)
InitPlan 2 (returns $1)
-> Limit (cost=0.27..0.37 rows=1 width=19)
-> Index Only Scan Backward using idx_relname on
t_search t_search_1 (cost=0.27..29.57 rows=303 width=19)
Index Cond: (relname IS NOT NULL)
(9 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
Any question?
Thanks you for your attention
Any question?

More Related Content

What's hot (20)

What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?
Mydbops
 
MySQL GTID 시작하기
MySQL GTID 시작하기MySQL GTID 시작하기
MySQL GTID 시작하기
I Goo Lee
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
Karwin Software Solutions LLC
 
Java 8 Lambda and Streams
Java 8 Lambda and StreamsJava 8 Lambda and Streams
Java 8 Lambda and Streams
Venkata Naga Ravi
 
04 Handling Exceptions
04 Handling Exceptions04 Handling Exceptions
04 Handling Exceptions
rehaniltifat
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
SQLMAP Tool Usage - A Heads Up
SQLMAP Tool Usage - A  Heads UpSQLMAP Tool Usage - A  Heads Up
SQLMAP Tool Usage - A Heads Up
Mindfire Solutions
 
[Pgday.Seoul 2020] SQL Tuning
[Pgday.Seoul 2020] SQL Tuning[Pgday.Seoul 2020] SQL Tuning
[Pgday.Seoul 2020] SQL Tuning
PgDay.Seoul
 
MySQL on AWS RDS
MySQL on AWS RDSMySQL on AWS RDS
MySQL on AWS RDS
Mydbops
 
The PostgreSQL Query Planner
The PostgreSQL Query PlannerThe PostgreSQL Query Planner
The PostgreSQL Query Planner
Command Prompt., Inc
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
Postgresql Database Administration Basic - Day1
Postgresql  Database Administration Basic  - Day1Postgresql  Database Administration Basic  - Day1
Postgresql Database Administration Basic - Day1
PoguttuezhiniVP
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuning
elliando dias
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
MongoDB
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
Altinity Ltd
 
MySQL Tutorial For Beginners | Relational Database Management System | MySQL ...
MySQL Tutorial For Beginners | Relational Database Management System | MySQL ...MySQL Tutorial For Beginners | Relational Database Management System | MySQL ...
MySQL Tutorial For Beginners | Relational Database Management System | MySQL ...
Edureka!
 
More mastering the art of indexing
More mastering the art of indexingMore mastering the art of indexing
More mastering the art of indexing
Yoshinori Matsunobu
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQL
Georgi Sotirov
 
Postgresql
PostgresqlPostgresql
Postgresql
NexThoughts Technologies
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Jaime Crespo
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?
Mydbops
 
MySQL GTID 시작하기
MySQL GTID 시작하기MySQL GTID 시작하기
MySQL GTID 시작하기
I Goo Lee
 
04 Handling Exceptions
04 Handling Exceptions04 Handling Exceptions
04 Handling Exceptions
rehaniltifat
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
SQLMAP Tool Usage - A Heads Up
SQLMAP Tool Usage - A  Heads UpSQLMAP Tool Usage - A  Heads Up
SQLMAP Tool Usage - A Heads Up
Mindfire Solutions
 
[Pgday.Seoul 2020] SQL Tuning
[Pgday.Seoul 2020] SQL Tuning[Pgday.Seoul 2020] SQL Tuning
[Pgday.Seoul 2020] SQL Tuning
PgDay.Seoul
 
MySQL on AWS RDS
MySQL on AWS RDSMySQL on AWS RDS
MySQL on AWS RDS
Mydbops
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
oysteing
 
Postgresql Database Administration Basic - Day1
Postgresql  Database Administration Basic  - Day1Postgresql  Database Administration Basic  - Day1
Postgresql Database Administration Basic - Day1
PoguttuezhiniVP
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuning
elliando dias
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
MongoDB
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
Altinity Ltd
 
MySQL Tutorial For Beginners | Relational Database Management System | MySQL ...
MySQL Tutorial For Beginners | Relational Database Management System | MySQL ...MySQL Tutorial For Beginners | Relational Database Management System | MySQL ...
MySQL Tutorial For Beginners | Relational Database Management System | MySQL ...
Edureka!
 
More mastering the art of indexing
More mastering the art of indexingMore mastering the art of indexing
More mastering the art of indexing
Yoshinori Matsunobu
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQL
Georgi Sotirov
 
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Query Optimization with MySQL 5.6: Old and New Tricks - Percona Live London 2013
Jaime Crespo
 

Viewers also liked (20)

Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data Types
Jonathan Katz
 
Database index
Database indexDatabase index
Database index
Riteshkiit
 
Advanced Index, Partitioning and Compression Strategies for SQL Server
Advanced Index, Partitioning and Compression Strategies for SQL ServerAdvanced Index, Partitioning and Compression Strategies for SQL Server
Advanced Index, Partitioning and Compression Strategies for SQL Server
Confio Software
 
Geek Sync | SQL Server Indexing Basics
Geek Sync | SQL Server Indexing BasicsGeek Sync | SQL Server Indexing Basics
Geek Sync | SQL Server Indexing Basics
IDERA Software
 
Les11 Including Constraints
Les11 Including ConstraintsLes11 Including Constraints
Les11 Including Constraints
NETsolutions Asia: NSA – Thailand, Sripatum University: SPU
 
Indexing basics
Indexing basicsIndexing basics
Indexing basics
Sourabh Agarwal
 
Advanced User Privileges
Advanced User PrivilegesAdvanced User Privileges
Advanced User Privileges
Arena PLM
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
metsarin
 
Less07 Users
Less07 UsersLess07 Users
Less07 Users
vivaankumar
 
Writing optimal queries
Writing optimal queriesWriting optimal queries
Writing optimal queries
Sourabh Agarwal
 
Postgre sql unleashed
Postgre sql unleashedPostgre sql unleashed
Postgre sql unleashed
Marian Marinov
 
5min analyse
5min analyse5min analyse
5min analyse
Hans-Jürgen Schönig
 
PostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreibenPostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreiben
Hans-Jürgen Schönig
 
Walbouncer: Filtering PostgreSQL transaction log
Walbouncer: Filtering PostgreSQL transaction logWalbouncer: Filtering PostgreSQL transaction log
Walbouncer: Filtering PostgreSQL transaction log
Hans-Jürgen Schönig
 
Explain explain
Explain explainExplain explain
Explain explain
Hans-Jürgen Schönig
 
PostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesPostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tables
Hans-Jürgen Schönig
 
PostgreSQL: The NoSQL way
PostgreSQL: The NoSQL wayPostgreSQL: The NoSQL way
PostgreSQL: The NoSQL way
Hans-Jürgen Schönig
 
Constraints In Sql
Constraints In SqlConstraints In Sql
Constraints In Sql
Anurag
 
Indexes
IndexesIndexes
Indexes
Randy Riness @ South Puget Sound Community College
 
PostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database securityPostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database security
Hans-Jürgen Schönig
 
Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data Types
Jonathan Katz
 
Database index
Database indexDatabase index
Database index
Riteshkiit
 
Advanced Index, Partitioning and Compression Strategies for SQL Server
Advanced Index, Partitioning and Compression Strategies for SQL ServerAdvanced Index, Partitioning and Compression Strategies for SQL Server
Advanced Index, Partitioning and Compression Strategies for SQL Server
Confio Software
 
Geek Sync | SQL Server Indexing Basics
Geek Sync | SQL Server Indexing BasicsGeek Sync | SQL Server Indexing Basics
Geek Sync | SQL Server Indexing Basics
IDERA Software
 
Advanced User Privileges
Advanced User PrivilegesAdvanced User Privileges
Advanced User Privileges
Arena PLM
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
metsarin
 
PostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreibenPostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreiben
Hans-Jürgen Schönig
 
Walbouncer: Filtering PostgreSQL transaction log
Walbouncer: Filtering PostgreSQL transaction logWalbouncer: Filtering PostgreSQL transaction log
Walbouncer: Filtering PostgreSQL transaction log
Hans-Jürgen Schönig
 
PostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesPostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tables
Hans-Jürgen Schönig
 
Constraints In Sql
Constraints In SqlConstraints In Sql
Constraints In Sql
Anurag
 
PostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database securityPostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database security
Hans-Jürgen Schönig
 
Ad

Similar to PostgreSQL: Advanced indexing (20)

Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with Postgres
EDB
 
query-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfquery-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdf
garos1
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with Postgres
EDB
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?
alexbrasetvik
 
Deep dive to PostgreSQL Indexes
Deep dive to PostgreSQL IndexesDeep dive to PostgreSQL Indexes
Deep dive to PostgreSQL Indexes
Ibrar Ahmed
 
Индексируем базу: как делать хорошо и не делать плохо Winter saint p 2021 m...
Индексируем базу: как делать хорошо и не делать плохо   Winter saint p 2021 m...Индексируем базу: как делать хорошо и не делать плохо   Winter saint p 2021 m...
Индексируем базу: как делать хорошо и не делать плохо Winter saint p 2021 m...
Андрей Новиков
 
Steam Learn: Introduction to RDBMS indexes
Steam Learn: Introduction to RDBMS indexesSteam Learn: Introduction to RDBMS indexes
Steam Learn: Introduction to RDBMS indexes
inovia
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humans
Craig Kerstiens
 
Indexes don't mean slow inserts.
Indexes don't mean slow inserts.Indexes don't mean slow inserts.
Indexes don't mean slow inserts.
Anastasia Lubennikova
 
Does PostgreSQL respond to the challenge of analytical queries?
Does PostgreSQL respond to the challenge of analytical queries?Does PostgreSQL respond to the challenge of analytical queries?
Does PostgreSQL respond to the challenge of analytical queries?
Andrey Lepikhov
 
Postgres Performance for Humans
Postgres Performance for HumansPostgres Performance for Humans
Postgres Performance for Humans
Citus Data
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
PROIDEA
 
Explaining the Postgres Query Optimizer
Explaining the Postgres Query OptimizerExplaining the Postgres Query Optimizer
Explaining the Postgres Query Optimizer
EDB
 
Full Text Search in PostgreSQL
Full Text Search in PostgreSQLFull Text Search in PostgreSQL
Full Text Search in PostgreSQL
Aleksander Alekseev
 
Полнотекстовый поиск в PostgreSQL / Александр Алексеев (Postgres Professional)
Полнотекстовый поиск в PostgreSQL / Александр Алексеев (Postgres Professional)Полнотекстовый поиск в PostgreSQL / Александр Алексеев (Postgres Professional)
Полнотекстовый поиск в PostgreSQL / Александр Алексеев (Postgres Professional)
Ontico
 
Using PostgreSQL statistics to optimize performance
Using PostgreSQL statistics to optimize performance Using PostgreSQL statistics to optimize performance
Using PostgreSQL statistics to optimize performance
Alexey Ermakov
 
Postgres indexes
Postgres indexesPostgres indexes
Postgres indexes
Bartosz Sypytkowski
 
Postgres indexes: how to make them work for your application
Postgres indexes: how to make them work for your applicationPostgres indexes: how to make them work for your application
Postgres indexes: how to make them work for your application
Bartosz Sypytkowski
 
PostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_CheatsheetPostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_Cheatsheet
Lucian Oprea
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
EDB
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with Postgres
EDB
 
query-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfquery-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdf
garos1
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with Postgres
EDB
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?
alexbrasetvik
 
Deep dive to PostgreSQL Indexes
Deep dive to PostgreSQL IndexesDeep dive to PostgreSQL Indexes
Deep dive to PostgreSQL Indexes
Ibrar Ahmed
 
Индексируем базу: как делать хорошо и не делать плохо Winter saint p 2021 m...
Индексируем базу: как делать хорошо и не делать плохо   Winter saint p 2021 m...Индексируем базу: как делать хорошо и не делать плохо   Winter saint p 2021 m...
Индексируем базу: как делать хорошо и не делать плохо Winter saint p 2021 m...
Андрей Новиков
 
Steam Learn: Introduction to RDBMS indexes
Steam Learn: Introduction to RDBMS indexesSteam Learn: Introduction to RDBMS indexes
Steam Learn: Introduction to RDBMS indexes
inovia
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humans
Craig Kerstiens
 
Does PostgreSQL respond to the challenge of analytical queries?
Does PostgreSQL respond to the challenge of analytical queries?Does PostgreSQL respond to the challenge of analytical queries?
Does PostgreSQL respond to the challenge of analytical queries?
Andrey Lepikhov
 
Postgres Performance for Humans
Postgres Performance for HumansPostgres Performance for Humans
Postgres Performance for Humans
Citus Data
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
PROIDEA
 
Explaining the Postgres Query Optimizer
Explaining the Postgres Query OptimizerExplaining the Postgres Query Optimizer
Explaining the Postgres Query Optimizer
EDB
 
Полнотекстовый поиск в PostgreSQL / Александр Алексеев (Postgres Professional)
Полнотекстовый поиск в PostgreSQL / Александр Алексеев (Postgres Professional)Полнотекстовый поиск в PostgreSQL / Александр Алексеев (Postgres Professional)
Полнотекстовый поиск в PostgreSQL / Александр Алексеев (Postgres Professional)
Ontico
 
Using PostgreSQL statistics to optimize performance
Using PostgreSQL statistics to optimize performance Using PostgreSQL statistics to optimize performance
Using PostgreSQL statistics to optimize performance
Alexey Ermakov
 
Postgres indexes: how to make them work for your application
Postgres indexes: how to make them work for your applicationPostgres indexes: how to make them work for your application
Postgres indexes: how to make them work for your application
Bartosz Sypytkowski
 
PostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_CheatsheetPostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_Cheatsheet
Lucian Oprea
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
EDB
 
Ad

Recently uploaded (20)

FME for Climate Data: Turning Big Data into Actionable Insights
FME for Climate Data: Turning Big Data into Actionable InsightsFME for Climate Data: Turning Big Data into Actionable Insights
FME for Climate Data: Turning Big Data into Actionable Insights
Safe Software
 
Integrating Survey123 and R&H Data Using FME
Integrating Survey123 and R&H Data Using FMEIntegrating Survey123 and R&H Data Using FME
Integrating Survey123 and R&H Data Using FME
Safe Software
 
Essentials of Resource Planning in a Downturn
Essentials of Resource Planning in a DownturnEssentials of Resource Planning in a Downturn
Essentials of Resource Planning in a Downturn
OnePlan Solutions
 
FME as an Orchestration Tool - Peak of Data & AI 2025
FME as an Orchestration Tool - Peak of Data & AI 2025FME as an Orchestration Tool - Peak of Data & AI 2025
FME as an Orchestration Tool - Peak of Data & AI 2025
Safe Software
 
AI and Deep Learning with NVIDIA Technologies
AI and Deep Learning with NVIDIA TechnologiesAI and Deep Learning with NVIDIA Technologies
AI and Deep Learning with NVIDIA Technologies
SandeepKS52
 
Maintaining + Optimizing Database Health: Vendors, Orchestrations, Enrichment...
Maintaining + Optimizing Database Health: Vendors, Orchestrations, Enrichment...Maintaining + Optimizing Database Health: Vendors, Orchestrations, Enrichment...
Maintaining + Optimizing Database Health: Vendors, Orchestrations, Enrichment...
BradBedford3
 
OpenTelemetry 101 Cloud Native Barcelona
OpenTelemetry 101 Cloud Native BarcelonaOpenTelemetry 101 Cloud Native Barcelona
OpenTelemetry 101 Cloud Native Barcelona
Imma Valls Bernaus
 
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps CyclesFrom Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
Marjukka Niinioja
 
IBM Rational Unified Process For Software Engineering - Introduction
IBM Rational Unified Process For Software Engineering - IntroductionIBM Rational Unified Process For Software Engineering - Introduction
IBM Rational Unified Process For Software Engineering - Introduction
Gaurav Sharma
 
Software Engineering Process, Notation & Tools Introduction - Part 3
Software Engineering Process, Notation & Tools Introduction - Part 3Software Engineering Process, Notation & Tools Introduction - Part 3
Software Engineering Process, Notation & Tools Introduction - Part 3
Gaurav Sharma
 
Generative Artificial Intelligence and its Applications
Generative Artificial Intelligence and its ApplicationsGenerative Artificial Intelligence and its Applications
Generative Artificial Intelligence and its Applications
SandeepKS52
 
iOS Developer Resume 2025 | Pramod Kumar
iOS Developer Resume 2025 | Pramod KumariOS Developer Resume 2025 | Pramod Kumar
iOS Developer Resume 2025 | Pramod Kumar
Pramod Kumar
 
Key AI Technologies Used by Indian Artificial Intelligence Companies
Key AI Technologies Used by Indian Artificial Intelligence CompaniesKey AI Technologies Used by Indian Artificial Intelligence Companies
Key AI Technologies Used by Indian Artificial Intelligence Companies
Mypcot Infotech
 
Scaling FME Flow on Demand with Kubernetes: A Case Study At Cadac Group SaaS ...
Scaling FME Flow on Demand with Kubernetes: A Case Study At Cadac Group SaaS ...Scaling FME Flow on Demand with Kubernetes: A Case Study At Cadac Group SaaS ...
Scaling FME Flow on Demand with Kubernetes: A Case Study At Cadac Group SaaS ...
Safe Software
 
Build enterprise-ready applications using skills you already have!
Build enterprise-ready applications using skills you already have!Build enterprise-ready applications using skills you already have!
Build enterprise-ready applications using skills you already have!
PhilMeredith3
 
Build Smarter, Deliver Faster with Choreo - An AI Native Internal Developer P...
Build Smarter, Deliver Faster with Choreo - An AI Native Internal Developer P...Build Smarter, Deliver Faster with Choreo - An AI Native Internal Developer P...
Build Smarter, Deliver Faster with Choreo - An AI Native Internal Developer P...
WSO2
 
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdfHow to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
QuickBooks Training
 
How AI Can Improve Media Quality Testing Across Platforms (1).pptx
How AI Can Improve Media Quality Testing Across Platforms (1).pptxHow AI Can Improve Media Quality Testing Across Platforms (1).pptx
How AI Can Improve Media Quality Testing Across Platforms (1).pptx
kalichargn70th171
 
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdfThe Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
Varsha Nayak
 
Micro-Metrics Every Performance Engineer Should Validate Before Sign-Off
Micro-Metrics Every Performance Engineer Should Validate Before Sign-OffMicro-Metrics Every Performance Engineer Should Validate Before Sign-Off
Micro-Metrics Every Performance Engineer Should Validate Before Sign-Off
Tier1 app
 
FME for Climate Data: Turning Big Data into Actionable Insights
FME for Climate Data: Turning Big Data into Actionable InsightsFME for Climate Data: Turning Big Data into Actionable Insights
FME for Climate Data: Turning Big Data into Actionable Insights
Safe Software
 
Integrating Survey123 and R&H Data Using FME
Integrating Survey123 and R&H Data Using FMEIntegrating Survey123 and R&H Data Using FME
Integrating Survey123 and R&H Data Using FME
Safe Software
 
Essentials of Resource Planning in a Downturn
Essentials of Resource Planning in a DownturnEssentials of Resource Planning in a Downturn
Essentials of Resource Planning in a Downturn
OnePlan Solutions
 
FME as an Orchestration Tool - Peak of Data & AI 2025
FME as an Orchestration Tool - Peak of Data & AI 2025FME as an Orchestration Tool - Peak of Data & AI 2025
FME as an Orchestration Tool - Peak of Data & AI 2025
Safe Software
 
AI and Deep Learning with NVIDIA Technologies
AI and Deep Learning with NVIDIA TechnologiesAI and Deep Learning with NVIDIA Technologies
AI and Deep Learning with NVIDIA Technologies
SandeepKS52
 
Maintaining + Optimizing Database Health: Vendors, Orchestrations, Enrichment...
Maintaining + Optimizing Database Health: Vendors, Orchestrations, Enrichment...Maintaining + Optimizing Database Health: Vendors, Orchestrations, Enrichment...
Maintaining + Optimizing Database Health: Vendors, Orchestrations, Enrichment...
BradBedford3
 
OpenTelemetry 101 Cloud Native Barcelona
OpenTelemetry 101 Cloud Native BarcelonaOpenTelemetry 101 Cloud Native Barcelona
OpenTelemetry 101 Cloud Native Barcelona
Imma Valls Bernaus
 
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps CyclesFrom Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
From Chaos to Clarity - Designing (AI-Ready) APIs with APIOps Cycles
Marjukka Niinioja
 
IBM Rational Unified Process For Software Engineering - Introduction
IBM Rational Unified Process For Software Engineering - IntroductionIBM Rational Unified Process For Software Engineering - Introduction
IBM Rational Unified Process For Software Engineering - Introduction
Gaurav Sharma
 
Software Engineering Process, Notation & Tools Introduction - Part 3
Software Engineering Process, Notation & Tools Introduction - Part 3Software Engineering Process, Notation & Tools Introduction - Part 3
Software Engineering Process, Notation & Tools Introduction - Part 3
Gaurav Sharma
 
Generative Artificial Intelligence and its Applications
Generative Artificial Intelligence and its ApplicationsGenerative Artificial Intelligence and its Applications
Generative Artificial Intelligence and its Applications
SandeepKS52
 
iOS Developer Resume 2025 | Pramod Kumar
iOS Developer Resume 2025 | Pramod KumariOS Developer Resume 2025 | Pramod Kumar
iOS Developer Resume 2025 | Pramod Kumar
Pramod Kumar
 
Key AI Technologies Used by Indian Artificial Intelligence Companies
Key AI Technologies Used by Indian Artificial Intelligence CompaniesKey AI Technologies Used by Indian Artificial Intelligence Companies
Key AI Technologies Used by Indian Artificial Intelligence Companies
Mypcot Infotech
 
Scaling FME Flow on Demand with Kubernetes: A Case Study At Cadac Group SaaS ...
Scaling FME Flow on Demand with Kubernetes: A Case Study At Cadac Group SaaS ...Scaling FME Flow on Demand with Kubernetes: A Case Study At Cadac Group SaaS ...
Scaling FME Flow on Demand with Kubernetes: A Case Study At Cadac Group SaaS ...
Safe Software
 
Build enterprise-ready applications using skills you already have!
Build enterprise-ready applications using skills you already have!Build enterprise-ready applications using skills you already have!
Build enterprise-ready applications using skills you already have!
PhilMeredith3
 
Build Smarter, Deliver Faster with Choreo - An AI Native Internal Developer P...
Build Smarter, Deliver Faster with Choreo - An AI Native Internal Developer P...Build Smarter, Deliver Faster with Choreo - An AI Native Internal Developer P...
Build Smarter, Deliver Faster with Choreo - An AI Native Internal Developer P...
WSO2
 
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdfHow to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
How to Generate Financial Statements in QuickBooks Like a Pro (1).pdf
QuickBooks Training
 
How AI Can Improve Media Quality Testing Across Platforms (1).pptx
How AI Can Improve Media Quality Testing Across Platforms (1).pptxHow AI Can Improve Media Quality Testing Across Platforms (1).pptx
How AI Can Improve Media Quality Testing Across Platforms (1).pptx
kalichargn70th171
 
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdfThe Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
The Future of Open Source Reporting Best Alternatives to Jaspersoft.pdf
Varsha Nayak
 
Micro-Metrics Every Performance Engineer Should Validate Before Sign-Off
Micro-Metrics Every Performance Engineer Should Validate Before Sign-OffMicro-Metrics Every Performance Engineer Should Validate Before Sign-Off
Micro-Metrics Every Performance Engineer Should Validate Before Sign-Off
Tier1 app
 

PostgreSQL: Advanced indexing

  • 1. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de PostgreSQL Indexing Dublin, 2013 Hans-Jürgen Schönig
  • 2. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de Scope of this session: - What a basic index does - The PostgreSQL optimizer (cost model) - Classical B-tree Indexes - Partial / functional indexes - Different types of indexes - Full-Text-Search - Fuzzy matching - Writing your own indexing strategy
  • 3. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Generating test data: - for the purpose of this session we need a table consisting of two columns: test=# CREATE TABLE t_test (id serial, name text); CREATE TABLE test=# INSERT INTO t_test (name) VALUES ('hans'); INSERT 0 1 test=# INSERT INTO t_test (name) VALUES ('paul'); INSERT 0 1
  • 4. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - A lot more test data ... - Let us create some more test data by repeating the process test=# INSERT INTO t_test (name) SELECT name FROM t_test; INSERT 0 2 ... test=# INSERT INTO t_test (name) SELECT name FROM t_test; INSERT 0 2097152
  • 5. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - A lot more test data ... - Let us create some more test data by repeating the process test=# INSERT INTO t_test (name) SELECT name FROM t_test; INSERT 0 2 ... test=# INSERT INTO t_test (name) SELECT name FROM t_test; INSERT 0 2097152
  • 6. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Reading some data: - Let us see, how PostgreSQL executes a simple query: test=# SELECT count(*) FROM t_test; count --------- 4194304 (1 row) Time: 431.192 ms test=# explain analyze SELECT count(*) FROM t_test; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Aggregate (cost=75100.80..75100.81 rows=1 width=0) (actual time=977.865..977.865 rows=1 loops=1) -> Seq Scan on t_test (cost=0.00..64615.04 rows=4194304 width=0) (actual time=0.013..531.448 rows=4194304 loops=1) Total runtime: 977.917 ms (3 rows) Time: 1045.065 ms
  • 7. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Reading some data: - Let us add a filter: test=# SELECT count(*) FROM t_test WHERE id = 421234; count ------- 1 (1 row) Time: 476.965 ms test=# explain analyze SELECT count(*) FROM t_test WHERE id = 421234; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Aggregate (cost=75100.80..75100.81 rows=1 width=0) (actual time=495.134..495.135 rows=1 loops=1) -> Seq Scan on t_test (cost=0.00..75100.80 rows=1 width=0) (actual time=53.405..495.126 rows=1 loops=1) Filter: (id = 421234) Rows Removed by Filter: 4194303 Total runtime: 495.175 ms (5 rows) Time: 520.659 ms
  • 8. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Sequentially reading data: - In case you like reading the phone book sequentially we are basically done. - Sequentially reading the phone book is technically ok => but socially not accepted - Defining an index is the desired solution
  • 9. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Creating an index test=# h CREATE INDEX Command: CREATE INDEX Description: define a new index Syntax: CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table_name [ USING method ] ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ) [ WITH ( storage_parameter = value [, ... ] ) ] [ TABLESPACE tablespace_name ] [ WHERE predicate ] - At the end of the day all clauses will be covered by this training
  • 10. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - A typical index: test=# CREATE INDEX idx_id ON t_test (id); CREATE INDEX Time: 7357.663 ms - This gives us a standard btree index - PostgreSQL provides “High-Concurrency B-Trees” (Lehman-Yao, 1981) - Many people can modify the index at the same time - Highly efficient B+ tree
  • 11. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - How a btree works: 8k Root Node ... Sorted ... Forward chaining Tabelle Index 8k ... Row linp
  • 12. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Indexing is beneficial test=# explain analyze SELECT count(*) FROM t_test WHERE id = 421234; QUERY PLAN ------------------------------------------------------------------------------ Aggregate (cost=8.73..8.74 rows=1 width=0) (actual time=0.024..0.024 rows=1 loops=1) -> Index Only Scan using idx_id on t_test (cost=0.00..8.73 rows=1 width=0) (actual time=0.019..0.020 rows=1 loops=1) Index Cond: (id = 421234) Heap Fetches: 1 Total runtime: 0.057 ms (5 rows) Time: 0.395 ms - A lot faster :).
  • 13. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Still slow ... test=# SELECT count(*) FROM t_test WHERE name = 'hans'; count --------- 2097152 (1 row) Time: 787.407 ms - This is still slow. Let us create an index ... test=# CREATE INDEX idx_name ON t_test (name); CREATE INDEX
  • 14. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - The benefit is exactly zero: test=# SELECT count(*) FROM t_test WHERE name = 'hans'; count --------- 2097152 (1 row) Time: 782.443 ms test=# explain SELECT count(*) FROM t_test WHERE name = 'hans'; QUERY PLAN ---------------------------------------------------------------------- Aggregate (cost=80350.32..80350.33 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0) Filter: (name = 'hans'::text) (3 rows) - The index won't be used - Too many identical values (“not selective”)
  • 15. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - The cost is far from zero: test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 177 MB (1 row) test=# SELECT pg_size_pretty(pg_relation_size('idx_id')); pg_size_pretty ---------------- 90 MB (1 row) test=# SELECT pg_size_pretty(pg_relation_size('idx_name')); pg_size_pretty ---------------- 90 MB (1 row) - Indexes need a fair amount of space
  • 16. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Input values DO make a difference: test=# explain SELECT count(*) FROM t_test WHERE name = 'hans'; QUERY PLAN ---------------------------------------------------------------------- Aggregate (cost=80350.32..80350.33 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0) Filter: (name = 'hans'::text) (3 rows) test=# explain SELECT count(*) FROM t_test WHERE name = 'hans2'; QUERY PLAN ---------------------------------------------------------------------------------- Aggregate (cost=7.74..7.75 rows=1 width=0) -> Index Only Scan using idx_name on t_test (cost=0.00..7.74 rows=1 width=0) Index Cond: (name = 'hans2'::text) (3 rows) - PostgreSQL will decide depending on the input value => cost based optimization
  • 17. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Partial indexes: - In our example the index is only used in case of rare or non-existing values - What is the point of an index when its entire content is totally useless? => a more selective strategy is needed
  • 18. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Partial indexes: test=# DROP INDEX idx_name; DROP INDEX test=# CREATE INDEX idx_name ON t_test (name) WHERE name NOT IN ('hans', 'paul'); CREATE INDEX test=# SELECT pg_size_pretty(pg_relation_size('idx_name')); pg_size_pretty ---------------- 8192 bytes (1 row) - A partial index reduces space consumption - Benefit is still the same
  • 19. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Equal benefit – lower cost: test=# explain SELECT count(*) FROM t_test WHERE name = 'hans'; QUERY PLAN ---------------------------------------------------------------------- Aggregate (cost=80350.32..80350.33 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0) Filter: (name = 'hans'::text) (3 rows) test=# explain SELECT count(*) FROM t_test WHERE name = 'hans2'; QUERY PLAN ---------------------------------------------------------------------------------- Aggregate (cost=7.28..7.29 rows=1 width=0) -> Index Only Scan using idx_name on t_test (cost=0.00..7.28 rows=1 width=0) Index Cond: (name = 'hans2'::text) (3 rows) - This is exactly the same as before !
  • 20. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - What about functions? test=# CREATE INDEX idx_cos ON t_test ( cos(id) ); CREATE INDEX Time: 16867.228 ms test=# explain SELECT count(*) FROM t_test WHERE cos(id) = 17; QUERY PLAN ---------------------------------------------------------------------------------- Aggregate (cost=23960.99..23961.00 rows=1 width=0) -> Bitmap Heap Scan on t_test (cost=395.25..23908.56 rows=20972 width=0) Recheck Cond: (cos((id)::double precision) = 17::double precision) -> Bitmap Index Scan on idx_cos (cost=0.00..390.01 rows=20972 width=0) Index Cond: (cos((id)::double precision) = 17::double precision) (5 rows) - PostgreSQL provides functional indexes - VERY nice to avoid additional columns - Gives a lot of extra flexibility
  • 21. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Type of functions allowed - Functions must be deterministic => “immutable” => Functions can be written in almost any language => This is highly performance sensitive
  • 22. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - How does PostgreSQL decide on index vs. no index? - PostgreSQL uses statistics to estimate the number of rows coming back - Each operation will be assigned to costs => costs are just a number to compare different options inside the planner - Costs parameters can be changed at runtime or globally => be careful, it can go against you
  • 23. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - pg_stats is your friend: test=# d pg_stats View "pg_catalog.pg_stats" Column | Type | Modifiers -------------------------------+-----------+----------- schemaname | name | tablename | name | attname | name | inherited | boolean | null_frac | real | avg_width | integer | n_distinct | real | most_common_vals | anyarray | most_common_freqs | real[] | histogram_bounds | anyarray | correlation | real | most_common_elems | anyarray | most_common_elem_freqs | real[] | elem_count_histogram | real[] |
  • 24. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Updating statistics - System statistics are updated by ANALYZE: test=# h ANALYZE Command: ANALYZE Description: collect statistics about a database Syntax: ANALYZE [ VERBOSE ] [ table_name [ ( column_name [, ...] ) ] ] - In most setups autovacuum is in charge of updating pg_statistic - In most cases statistics are not an issue
  • 25. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - How does PostgreSQL estimate costs? - seq_page_cost = 1 - random_page_cost = 4 - cpu_tuple_cost = 0.01 - cpu_operator_cost = 0.0025 - cpu_index_tuple_cost = 0.005
  • 26. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Let us do the math (1): test=# explain SELECT count(*) FROM t_test; QUERY PLAN ---------------------------------------------------------------------- Aggregate (cost=75100.80..75100.81 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..64615.04 rows=4194304 width=0) (2 rows) - total costs are at 75100.81 - costs are composed of I/O and CPU costs
  • 27. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Let us do the math (2): test=# SELECT pg_relation_size('t_test') / 8192; ?column? ---------- 22672 (1 row) - our table consists of 22672 blocks - each block is 8kb in size
  • 28. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Let us do the math (3): The seq scan: I/O cost = 22672 * seq_page_cost = 22672 4.194.304 * cpu_tuple_cost = 41943.04 = 64615.04 for the seq scan The aggregate: 4.194.304 * cpu_operator_cost = 10485.76 Total costs => 75.100.80 + cpu_operator_cost (we have to display the tuple)
  • 29. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Inflation at work: test=# SET seq_page_cost TO 10; SET test=# explain SELECT count(*) FROM t_test; QUERY PLAN ----------------------------------------------------------------------- Aggregate (cost=279148.80..279148.81 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..268663.04 rows=4194304 width=0) (2 rows) - Costs can be changed at runtime to fine tune index usage => only do this if you are fully aware of what you are doing. It can have unintended side effects
  • 30. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Spinning disks vs. SSDs - Traditional disks are fast sequentially and pretty bad when doing random I/O - SSDs fixed the problem. => consider changing random_page_cost
  • 31. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Abusing tablespaces: test=# ALTER TABLESPACE pg_default SET (random_page_cost = 1); ALTER TABLESPACE - Allows different cost settings for various disk subsystems - It also allows to split “cached” and “uncached” data -> ugly but useful
  • 32. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Correlation and disk layout test=# CREATE TABLE t_random AS SELECT * FROM t_test ORDER BY random(); SELECT 4194304 test=# CREATE INDEX idx_random ON t_random(id); CREATE INDEX test=# ANALYZE t_random; ANALYZE - The PostgreSQL optimizer considers the physical order of rows on disk - High-correlation will make indexes ways more likely as the optimizer reduces its estimates for I/O costs.
  • 33. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Correlation and disk layout test=# explain SELECT count(*) FROM t_test WHERE id < 1000; QUERY PLAN ------------------------------------------------------------------------------- Aggregate (cost=75.35..75.36 rows=1 width=0) -> Index Only Scan using idx_id on t_test (cost=0.00..72.72 rows=1049 width=0) Index Cond: (id < 1000) (3 rows) test=# explain SELECT count(*) FROM t_random WHERE id < 1000; QUERY PLAN ------------------------------------------------------------------------------- Aggregate (cost=950.31..950.32 rows=1 width=0) -> Index Only Scan using idx_random on t_random (cost=0.00..947.94 rows=947 width=0) Index Cond: (id < 1000) (3 rows)
  • 34. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Implications: - This is why different plans can pop up EVEN if the data is the same - There is no fixed amount of data making PostgreSQL switch from index to sequential scan - High correlation can improve performance => consider clustering the table test=# h CLUSTER Command: CLUSTER Description: cluster a table according to an index Syntax: CLUSTER [VERBOSE] table_name [ USING index_name ] CLUSTER [VERBOSE]
  • 35. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Using OR / AND: - PostgreSQL can use more than one index per table per query - PostgreSQL provides multi-column indexes - What you might see is a so called “Bitmap Scan” => don't mix it up with Oracle Bitmap Indexes
  • 36. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Bitmap scans: test=# explain SELECT * FROM t_test WHERE id = 2343 OR id = 423423; QUERY PLAN --------------------------------------------------------------------------- Bitmap Heap Scan on t_test (cost=9.44..17.41 rows=2 width=9) Recheck Cond: ((id = 2343) OR (id = 423423)) -> BitmapOr (cost=9.44..9.44 rows=2 width=0) -> Bitmap Index Scan on idx_id (cost=0.00..4.72 rows=1 width=0) Index Cond: (id = 2343) -> Bitmap Index Scan on idx_id (cost=0.00..4.72 rows=1 width=0) Index Cond: (id = 423423) (7 rows) - PostgreSQL will scan the index twice - PostgreSQL will look for blocks in the underlying table - The condition has to be re-evaluated
  • 37. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Bitmap scans: test=# explain SELECT * FROM t_test WHERE id = 2343 AND name = 'josef'; QUERY PLAN ----------------------------------------------------------------------- Index Scan using idx_name on t_test (cost=0.00..8.27 rows=1 width=9) Index Cond: (name = 'josef'::text) Filter: (id = 2343) (3 rows) - PostgreSQL does not always use two indexes when you have 2 quals - The more selective index might be enough
  • 38. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Multicolumn indexes: test=# DROP INDEX idx_id; DROP INDEX test=# CREATE INDEX idx_combined ON t_test (id, name); CREATE INDEX test=# explain SELECT * FROM t_test WHERE id = 10; QUERY PLAN -------------------------------------------------------------------------------- Index Only Scan using idx_combined on t_test (cost=0.00..8.91 rows=1 width=9) Index Cond: (id = 10) (2 rows) - PostgreSQL can use parts of those column IF they are in the first part(s) of the index - Imagine a phone book; it is just liked a combined index
  • 39. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Many indexes or combined indexes? - It depends on what you want to query - If you always use the first conditions in the index a combined index might be a good idea - Many indexes are more flexible but maybe not perfect - Sometimes a mixed-strategy can be useful
  • 40. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 4. Indexes to provide order - b-tress can be used for more than searching - Binary trees provide you with order. - Order helps to avoid repeated sorting. test=# explain SELECT * FROM t_test ORDER BY id LIMIT 10; QUERY PLAN -------------------------------------------------------------------------------------- Limit (cost=0.00..0.31 rows=10 width=9) -> Index Scan using idx_id on t_test (cost=0.00..131602.27 rows=4194304 width=9) (2 rows)
  • 41. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 5. Dealing with upper / lowercase - Upper and lower case searches are common: - If you want to do case-insensitive, don't use a functional index - Consider using “citext” test=# CREATE EXTENSION citext; CREATE EXTENSION test=# SELECT 'ABC'::citext = 'abc'::citext; ?column? ---------- t (1 row)
  • 42. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 6. Different types of indexes - PostgreSQL supports more than just btrees - B-Trees are fine if you are interested in things which can be sorted - Try to sort polygons => you won't find them - Geometric data and Full-Text-Search need different algorithms NOTE: This is not about, which index is faster. This is about the correct ALGORITHM
  • 43. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 6. Different types of indexes - Index types provided by PostgreSQL - B-Trees - Gist: Generalized Search Tree - Gin: Generalized Inverted Index - Sp-Gist: Space Partitioned Gist - Hash
  • 44. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 6. Different types of indexes - Indexes and algorithms - B-Trees: numbers, text, dates, etc. - Gist: Generalized Search Tree - Gin: Generalized Inverted Index - Sp-Gist: Space Partitioned Gist - Hash
  • 45. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. Gist indexes - Gist operates on different principles than btree - it supports “contains”, “left of”, “overlaps”, etc. - “contains”, etc. are good for => Full Text Search => Geometric operations (PostGIS, etc.) => Finding genome sequences => Handling ranges (time, etc.) => Fuzzy search - Gist allows KNN-search
  • 46. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. Gist indexes - How it works internally ...
  • 47. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. GIN indexes - Gist is a so called inverted index - Used for Full Text Search - If you have 1 mio documents containing the word “house”. Do you really want to have house inside the index 1 mio times? => Binary tree for words => A document list for each word => Classical approach to text search - FTS is not about “=”, it is about “contains” => forget btree
  • 48. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. GIN indexes - GIN internal workings:
  • 49. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. SP-Gist indexes - SP-Gist is a space partitioned index - Can be used for a variety of algorithms, which use space partitioning => quad trees => suffix trees => k-d trees
  • 50. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. SP-Gist indexes - Quad trees: A prototype example ... - We want to insert ... (6, 4) and (2, 8)
  • 51. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Stemming: - Before searching, it makes sense to perform “stemming” test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car'); to_tsvector ----------------------------------------- 'better':5 'car':3,11 'mani':2 'one':10 (1 row)
  • 52. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Stemming is language dependent: - Stemming works nicely for “roman” languages => it is hard to do this for chinese and so on test=# SELECT to_tsvector('english', 'i am'), to_tsvector('german', 'i am'), to_tsvector('dutch', 'i am'); to_tsvector | to_tsvector | to_tsvector -------------+-------------+-------------- | 'i':1 | 'am':2 'i':1 (1 row)
  • 53. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - “contains” is your friend: - ts_query compares a search string with a so called ts_vector: test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car') @@ to_tsquery('english', 'car'); ?column? ---------- t (1 row)
  • 54. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - “contains” is your friend: - ts_query compares a search string with a so called ts_vector: test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car') @@ to_tsquery('english', 'car'); ?column? ---------- t (1 row)
  • 55. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Indexing is easy: - All you need is a functional index - Alternatively the stemmed content can be “materialized” in a separate column CREATE INDEX idx_fti ON t_test USING gist (to_tsvector('german', name));
  • 56. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - ts_vector and ts_query magic - PostgreSQL allows you to use “and” (&) and “or” (|) test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car') @@ to_tsquery('english', 'car & truck'); ?column? ---------- f (1 row) test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car') @@ to_tsquery('english', '(car | truck) & many'); ?column? ---------- t (1 row)
  • 57. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - A stupid question: What is a “word”? - PostgreSQL is NOT limited to textual search - Remember, it is all about “contains” ... - Create yourself your own parser: test=# h CREATE TEXT SEARCH PARSER Command: CREATE TEXT SEARCH PARSER Description: define a new text search parser Syntax: CREATE TEXT SEARCH PARSER name ( START = start_function , GETTOKEN = gettoken_function , END = end_function , LEXTYPES = lextypes_function [, HEADLINE = headline_function ] )
  • 58. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Even more flexibility (2): test=# h CREATE TEXT SEARCH CONFIGURATION Command: CREATE TEXT SEARCH CONFIGURATION Description: define a new text search configuration Syntax: CREATE TEXT SEARCH CONFIGURATION name ( PARSER = parser_name | COPY = source_config )
  • 59. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Even more flexibility: test=# h CREATE TEXT SEARCH DICTIONARY Command: CREATE TEXT SEARCH DICTIONARY Description: define a new text search dictionary Syntax: CREATE TEXT SEARCH DICTIONARY name ( TEMPLATE = template [, option = value [, ... ]] ) test=# h CREATE TEXT SEARCH TEMPLATE Command: CREATE TEXT SEARCH TEMPLATE Description: define a new text search template Syntax: CREATE TEXT SEARCH TEMPLATE name ( [ INIT = init_function , ] LEXIZE = lexize_function )
  • 60. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - What does it take to organize a btree? Operator Strategy number < 1 <= 2 = 3 >= 4 < 5
  • 61. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Why care? - The way numbers are treated is pretty “common” - How about sorting this one? “2305 09 04 78” “4353 07 06 77” => it seems the sort order is correct as shown => it isn't – it is an Austrian social security number => 1977 was before 1978 and not other way round
  • 62. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Defining indexing strategies - We can write our own operators - Those operators can be assigned to an operator class, which will tell the index how to “behave” “2305 09 04 78” “4353 07 06 77” => it seems the sort order is correct as shown => it isn't – it is an Austrian social security number => 1977 was before 1978 and not other way round
  • 63. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Writing an operator (1): test=# CREATE OR REPLACE FUNCTION normalize_si(text) RETURNS text AS $$ BEGIN RETURN substring($1, 9, 2) || substring($1, 7, 2) || substring($1, 5, 2) || substring($1, 1, 4); END; $$ LANGUAGE 'plpgsql' IMMUTABLE; CREATE FUNCTION test=# SELECT normalize_si('2305090478'); normalize_si -------------- 7804092305 (1 row)
  • 64. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Writing an operator (2): test=# CREATE OR REPLACE FUNCTION si_lt(text, text) RETURNS boolean AS $$ BEGIN RETURN normalize_si($1) < normalize_si($2); END; $$ LANGUAGE 'plpgsql' IMMUTABLE; test=# CREATE OPERATOR <# ( PROCEDURE=si_lt, LEFTARG=text, RIGHTARG=text); CREATE OPERATOR CREATE FUNCTION test=# SELECT '2305090478'::text <# '4353070677'::text; ?column? ---------- f (1 row)
  • 65. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Creating the operator class: - write operators for all operations needed - write “support functions” (= “same”, etc.) - make sure that the most important strategies have proper operators test=# h CREATE OPERATOR CLASS Command: CREATE OPERATOR CLASS Description: define a new operator class Syntax: CREATE OPERATOR CLASS name [ DEFAULT ] FOR TYPE data_type USING index_method [ FAMILY family_name ] AS { OPERATOR strategy_number operator_name [ ( op_type, op_type ) ] [ FOR SEARCH | FOR ORDER BY sort_family_name ] | FUNCTION support_number [ ( op_type [ , op_type ] ) ] function_name ( argument_type [, ...] ) | STORAGE storage_type } [, ... ]
  • 66. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 10. Available operator classes - pg_trgm - Trigrams are perfect to perform fuzzy matching - Trigrams can be used nicely along with KNN-search - pg_trgm is available as extension to PostgreSQL test=# CREATE EXTENSION pg_trgm; CREATE EXTENSION - Problem: “What is the proper way to spell the name of this village? “gramatneusiedl” vs. “grammatneusiedel”?
  • 67. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 10. Available operator classes - Testing pg_trgm test=# CREATE TABLE t_search AS SELECT relname::text FROM pg_class; SELECT 303 test=# CREATE INDEX idx_trgm ON t_search USING gist(relname gist_trgm_ops); CREATE INDEX
  • 68. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 10. Available operator classes - Testing pg_trgm (2): test=# SELECT *, 'pgclass' <-> relname FROM t_search ORDER BY 'pgclass' <-> relname LIMIT 10; relname | ?column? --------------------------------+---------- pg_class | 0.454545 pg_opclass | 0.538462 pg_class_oid_index | 0.714286 pg_opclass_oid_index | 0.727273 pg_class_relname_nsp_index | 0.793103 pg_opclass_am_name_nsp_index | 0.8 pg_seclabel | 0.823529 pg_am | 0.833333 pg_seclabels | 0.833333 pg_shseclabel | 0.842105 (10 rows)
  • 69. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 10. Available operator classes - KNN in action: test=# explain SELECT *, 'pgclass' <-> relname FROM t_search ORDER BY 'pgclass' <-> relname LIMIT 10; QUERY PLAN ----------------------------------------------------------------------------------- Limit (cost=0.14..1.40 rows=10 width=19) -> Index Scan using idx_trgm on t_search (cost=0.14..38.20 rows=303 width=19) Order By: (relname <-> 'pgclass'::text) (3 rows)
  • 70. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 11. Traditional LIKE - LIKE can be indexed in some cases: - The PostgreSQL optimizer can rewrite queries featuring LIKE in a fancy and efficient way => The goal is to find the “next character” in line and query for a range - This kind of rewrite only works when the next character Is actually knows to PostgreSQL - Special operator classes might be needed => varchar_pattern_ops, text_pattern_ops
  • 71. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 11. Traditional LIKE - An example: test=# CREATE INDEX idx_relname ON t_search (relname); CREATE INDEX test=# SET enable_seqscan TO off; SET test=# explain SELECT relname FROM t_search WHERE relname LIKE 'abc%'; QUERY PLAN ---------------------------------------------------------------------------------- Index Only Scan using idx_relname on t_search (cost=0.27..8.29 rows=1 width=19) Index Cond: ((relname >= 'abc'::text) AND (relname < 'abd'::text)) Filter: (relname ~~ 'abc%'::text) (3 rows)
  • 72. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 12. Indexing MIN / MAX - An example: - MIN / MAX works by reading the index from left and right (backward scan) test=# explain SELECT min(relname), max(relname) FROM t_search; QUERY PLAN ---------------------------------------------------------------------------------- Result (cost=0.74..0.75 rows=1 width=0) InitPlan 1 (returns $0) -> Limit (cost=0.27..0.37 rows=1 width=19) -> Index Only Scan using idx_relname on t_search (cost=0.27..29.57 rows=303 width=19) Index Cond: (relname IS NOT NULL) InitPlan 2 (returns $1) -> Limit (cost=0.27..0.37 rows=1 width=19) -> Index Only Scan Backward using idx_relname on t_search t_search_1 (cost=0.27..29.57 rows=303 width=19) Index Cond: (relname IS NOT NULL) (9 rows)
  • 73. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de Any question? Thanks you for your attention Any question?