-
+
Reliability and the Write-Ahead Log
some later time. Such caches can be a reliability hazard because the
memory in the disk controller cache is volatile, and will lose its
contents in a power failure. Better controller cards have
- battery-backed> caches, meaning the card has a battery that
+
battery-backed unit> (BBU>) caches, meaning
+ the card has a battery that
maintains power to the cache in case of system power loss. After power
is restored the data will be written to the disk drives.
And finally, most disk drives have caches. Some are write-through
- while some are write-back, and the
- same concerns about data loss exist for write-back drive caches as
- exist for disk controller caches. Consumer-grade IDE and SATA drives are
- particularly likely to have write-back caches that will not survive a
- power failure, though
ATAPI-6> introduced a drive cache
- flush command (FLUSH CACHE EXT) that some file systems use, e.g.
ZFS>.
- Many solid-state drives (SSD) also have volatile write-back
- caches, and many do not honor cache flush commands by default.
+ while some are write-back, and the same concerns about data loss
+ exist for write-back drive caches as exist for disk controller
+ caches. Consumer-grade IDE and SATA drives are particularly likely
+ to have write-back caches that will not survive a power failure,
+ though
ATAPI-6> introduced a drive cache flush command
+ (FLUSH CACHE EXT>) that some file systems use, e.g.
+
ZFS>, ext4>. (The SCSI command
+ SYNCHRONIZE CACHE> has long been available.) Many
+ solid-state drives (SSD) also have volatile write-back caches, and
+ many do not honor cache flush commands by default.
+
+
To check write caching on
Linux> use
hdparm -I>; it is enabled if there is a *> next
to Write cache>; hdparm -W> to turn off
fsync_writethrough> never do write caching.
+ Many file systems that use write barriers (e.g.
ZFS>,
+
ext4>) internally use FLUSH CACHE EXT> or
+ SYNCHRONIZE CACHE> commands to flush data to the platers on
+ write-back-enabled drives. Unfortunately, such write barrier file
+ systems behave suboptimally when combined with battery-backed unit
+ (
BBU>) disk controllers. In such setups, the synchronize
+ command forces all data from the BBU to the disks, eliminating much
+ of the benefit of the BBU. You can run the utility
+ src/tools/fsync> in the PostgreSQL source tree to see
+ if you are effected. If you are effected, the performance benefits
+ of the BBU cache can be regained by turning off write barriers in
+ the file system or reconfiguring the disk controller, if that is
+ an option. If write barriers are turned off, make sure the battery
+ remains active; a faulty battery can potentially lead to data loss.
+ Hopefully file system and disk controller designers will eventually
+ address this suboptimal behavior.
+
+
When the operating system sends a write request to the storage hardware,
there is little it can do to make sure the data has arrived at a truly