BINARY
- Forces all data to be stored or read in binary format rather
+ Causes all data to be stored or read in binary format rather
than as text. You cannot specify the
or options in binary mode.
- The BINARY key word will force all data to be
+ The BINARY key word causes all data to be
stored/read as binary format rather than as text. It is
- somewhat faster than the normal text mode, but a binary format
- file is not portable across machine architectures.
+ somewhat faster than the normal text mode, but a binary-format
+ file is less portable across machine architectures and
- You must have select privilege on any table
+ You must have select privilege on the table
whose values are read by COPY TO, and
- insert privilege on a table into which values
- are being inserted by COPY FROM.
+ insert privilege on the table into which values
+ are inserted by COPY FROM.
End of data can be represented by a single line containing just
backslash-period (\.>). An end-of-data marker is
not necessary when reading from a file, since the end of file
- serves perfectly well; but an end marker must be provided when copying
- data to or from a client application.
+ serves perfectly well; it is needed only when copying data to or from
+ client applications using pre-3.0 client protocol.
possible to represent a data carriage return by a backslash and carriage
return, and to represent a data newline by a backslash and newline.
However, these representations might not be accepted in future releases.
+ They are also highly vulnerable to corruption if the COPY file is
+ transferred across different machines (for example, from Unix to Windows
+ or vice versa).
The file format used for COPY BINARY changed in
-
PostgreSQL 7.
1. The new format consists
+
PostgreSQL 7.
4. The new format consists
of a file header, zero or more tuples containing the row data, and
a file trailer.
File Header
- The file header consists of 24 bytes of fixed fields, followed
+ The file header consists of 15 bytes of fixed fields, followed
by a variable-length header extension area. The fixed fields are:
Signature
-12-byte sequence PGBCOPY\n\377\r\n\0> --- note that the zero byte
+11-byte sequence PGCOPY\n\377\r\n\0> --- note that the zero byte
is a required part of the signature. (The signature is designed to allow
easy identification of files that have been munged by a non-8-bit-clean
transfer. This signature will be changed by end-of-line-translation
-
- Integer layout field
-
-32-bit integer constant 0x01020304 in source's byte order. Potentially, a reader
-could engage in byte-flipping of subsequent fields if the wrong byte
-order is detected here.
-
-
-
-
Flags field
-32-bit integer bit mask to denote important aspects of the file format. Bits are
-numbered from 0 (
LSB>) to 31 (MSB>) --- note that this field is stored
-with source's endianness, as are all subsequent integer fields. Bits
+32-bit integer bit mask to denote important aspects of the file format. Bits
+are numbered from 0 (
LSB>) to 31 (MSB>). Note that
+this field is stored in network byte order (most significant byte first),
+as are all the integer fields used in the file format. Bits
16-31 are reserved to denote critical file format issues; a reader
should abort if it finds an unexpected bit set in this range. Bits 0-15
are reserved to signal backwards-compatible format issues; a reader
Tuples
Each tuple begins with a 16-bit integer count of the number of fields in the
-tuple. (Presently, all tuples in a table will have the same count, but
-that might not always be true.) Then, repeated for each field in the
-tuple, there is a 16-bit integer typlen> word possibly followed by field data.
-The typlen> field is interpreted thus:
-
-
-
- Zero
-
- Field is null. No data follows.
-
-
-
-
-
- > 0
-
- Field is a fixed-length data type. Exactly that many
- bytes of data follow the typlen> word.
-
-
-
-
-
- -1
-
- Field is a varlena> data type. The next four
- bytes are the varlena> header, which contains
- the total value length including the header itself.
-
-
-
-
-
- < -1
-
- Reserved for future use.
-
-
-
-
+tuple. (Presently, all tuples in a table will have the same count, but that
+might not always be true.) Then, repeated for each field in the tuple, there
+is a 32-bit length word followed by that many bytes of field data. (The
+length word does not include itself, and can be zero.) As a special case,
+-1 indicates a NULL field value. No value bytes follow in the NULL case.
-For nonnull fields, the reader can check that the typlen> matches the
-expected typlen> for the destination column. This provides a simple
-but very useful check that the data is as expected.
+There is no alignment padding or any other extra data between fields.
-There is no alignment padding or any other extra data between fields.
-Note also that the format does not distinguish whether a data type is
-pass-by-reference or pass-by-value. Both of these provisions are
-deliberate: they might help improve portability of the files (although
-of course endianness and floating-point-format issues can still keep
-you from moving a binary file across machines).
+Presently, all data values in a COPY BINARY file are
+assumed to be in binary format (format code one). It is anticipated that a
+future extension may add a header field that allows per-column format codes
+to be specified.
If OIDs are included in the file, the OID field immediately follows the
field-count word. It is a normal field except that it's not included
-in the field-count. In particular it has a typlen> --- this will allow
+in the field-count. In particular it has a length word --- this will allow
handling of 4-byte vs. 8-byte OIDs without too much pain, and will allow
OIDs to be shown as null if that ever proves desirable.
File Trailer
- The file trailer consists of an 16-bit integer word containing -1. This is
- easily distinguished from a tuple's field-count word.
+ The file trailer consists of a 16-bit integer word containing -1. This
+ is easily distinguished from a tuple's field-count word.
Here is a sample of data suitable for copying into a table from
- STDIN (so it must have the termination sequence on the
- last line):
+ STDIN:
AF AFGHANISTAN
AL ALBANIA
DZ ALGERIA
ZM ZAMBIA
ZW ZIMBABWE
-\.
Note that the white space on each line is actually a tab character.
+ XXX the following example is OBSOLETE and needs to be updated for the
+ 7.4 binary format:
+
+
The following is the same data, output in binary format on a
Linux/i586 machine. The data is shown after filtering through the