+ linkend="sql-prepare" endterm="sql-prepare-title"> to create a
+ prepared INSERT statement. Since you are
+ executing the same command multiple times, it is more efficient to
+ prepare the command once and then use EXECUTE
+ as many times as required.
-
Use COPY FROM
+
Use COPY
+
+ Use to load
+ all the rows in one command, instead of using a series of
+ INSERT commands. The COPY
+ command is optimized for loading large numbers of rows; it is less
+ flexible than INSERT, but incurs significantly
+ less overhead for large data loads. Since COPY
+ is a single command, there is no need to disable autocommit if you
+ use this method to populate a table.
+
- Use COPY FROM STDIN to load all the rows in one
- command, instead of using a series of INSERT
- commands. This reduces parsing, planning, etc. overhead a great
- deal. If you do this then it is not necessary to turn off
- autocommit, since it is only one command anyway.
+ Note that loading a large number of rows using
+ COPY is almost always faster than using
+ INSERT, even if multiple
+ INSERT commands are batched into a single
+ transaction.
If you are augmenting an existing table, you can drop the index,
- load the table, then recreate the index. Of
- course, the database performance for other users may be adversely
- affected during the time that the index is missing. One should also
- think twice before dropping unique indexes, since the error checking
- afforded by the unique constraint will be lost while the index is missing.
+ load the table, and then recreate the index. Of course, the
+ database performance for other users may be adversely affected
+ during the time that the index is missing. One should also think
+ twice before dropping unique indexes, since the error checking
+ afforded by the unique constraint will be lost while the index is
+ missing.
+
+
Increase checkpoint_segments
+
+ Temporarily increasing the
+ linkend="guc-checkpoint-segments"> configuration variable can also
+ make large data loads faster. This is because loading a large
+ amount of data into
PostgreSQL can
+ cause checkpoints to occur more often than the normal checkpoint
+ frequency (specified by the checkpoint_timeout
+ configuration variable). Whenever a checkpoint occurs, all dirty
+ pages must be flushed to disk. By increasing
+ checkpoint_segments temporarily during bulk
+ data loads, the number of checkpoints that are required can be
+ reduced.
+
+
+
Run ANALYZE Afterwards
- It's a good idea to run ANALYZE or VACUUM
- ANALYZE anytime you've added or updated a lot of data,
- including just after initially populating a table. This ensures that
- the planner has up-to-date statistics about the table. With no statistics
- or obsolete statistics, the planner may make poor choices of query plans,
- leading to bad performance on queries that use your table.
+ Whenever you have significantly altered the distribution of data
+ within a table, running
+ endterm="sql-analyze-title"> is strongly recommended. This
+ includes when bulk loading large amounts of data into
+ ANALYZE (or VACUUM ANALYZE)
+ ensures that the planner has up-to-date statistics about the
+ table. With no statistics or obsolete statistics, the planner may
+ make poor decisions during query planning, leading to poor
+ performance on any tables with inaccurate or nonexistent
+ statistics.