Update COPY BINARY file format spec to reflect recent decisions about

author Tom Lane

Wed, 7 May 2003 22:23:27 +0000 (22:23 +0000)

committer Tom Lane

Wed, 7 May 2003 22:23:27 +0000 (22:23 +0000)
author Tom Lane
Wed, 7 May 2003 22:23:27 +0000 (22:23 +0000)
committer Tom Lane
Wed, 7 May 2003 22:23:27 +0000 (22:23 +0000)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml

index 48fa55629347f4586ca85fb9a157591997885d91..f60388bb05beba1f17b24dae896dc5733f69dc21 100644 (file)
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -1,5 +1,5 @@
  
  
@@ -119,7 +119,7 @@ COPY table [ ( 
      BINARY
      
       
-      Forces all data to be stored or read in binary format rather
+      Causes all data to be stored or read in binary format rather
        than as text. You cannot specify the 
        or  options in binary mode.
       
@@ -193,17 +193,18 @@ COPY table [ ( 
     
  
     
-    The BINARY key word will force all data to be
+    The BINARY key word causes all data to be
      stored/read as binary format rather than as text.  It is
-    somewhat faster than the normal text mode, but a binary format
-    file is not portable across machine architectures.
+    somewhat faster than the normal text mode, but a binary-format
+    file is less portable across machine architectures and
+    PostgreSQL versions.
     
  
     
-    You must have select privilege on any table
+    You must have select privilege on the table
      whose values are read by COPY TO, and
-    insert privilege on a table into which values
-    are being inserted by COPY FROM.
+    insert privilege on the table into which values
+    are inserted by COPY FROM.
     
  
     
@@ -279,8 +280,8 @@ COPY table [ ( 
      End of data can be represented by a single line containing just
      backslash-period (\.).  An end-of-data marker is
      not necessary when reading from a file, since the end of file
-    serves perfectly well; but an end marker must be provided when copying
-    data to or from a client application.
+    serves perfectly well; it is needed only when copying data to or from
+    client applications using pre-3.0 client protocol.
     
  
     
@@ -358,6 +359,9 @@ COPY table [ ( 
      possible to represent a data carriage return by a backslash and carriage
      return, and to represent a data newline by a backslash and newline.  
      However, these representations might not be accepted in future releases.
+    They are also highly vulnerable to corruption if the COPY file is
+    transferred across different machines (for example, from Unix to Windows
+    or vice versa).
     
  
     
@@ -374,7 +378,7 @@ COPY table [ ( 
  
     
      The file format used for COPY BINARY changed in
-    PostgreSQL 7.1. The new format consists
+    PostgreSQL 7.4. The new format consists
      of a file header, zero or more tuples containing the row data, and
      a file trailer.
     
@@ -383,7 +387,7 @@ COPY table [ ( 
      File Header
  
      
-     The file header consists of 24 bytes of fixed fields, followed
+     The file header consists of 15 bytes of fixed fields, followed
       by a variable-length header extension area.  The fixed fields are:
  
      
@@ -391,7 +395,7 @@ COPY table [ ( 
        Signature
        
         
-12-byte sequence PGBCOPY\n\377\r\n\0 --- note that the zero byte
+11-byte sequence PGCOPY\n\377\r\n\0 --- note that the zero byte
  is a required part of the signature.  (The signature is designed to allow
  easy identification of files that have been munged by a non-8-bit-clean
  transfer.  This signature will be changed by end-of-line-translation
@@ -400,24 +404,14 @@ filters, dropped zero bytes, dropped high bits, or parity changes.)
        
       
  
-     
-      Integer layout field
-      
-       
-32-bit integer constant 0x01020304 in source's byte order. Potentially, a reader
-could engage in byte-flipping of subsequent fields if the wrong byte
-order is detected here.
-       
-      
-     
-
       
        Flags field
        
         
-32-bit integer bit mask to denote important aspects of the file format. Bits are
-numbered from 0 (LSB) to 31 (MSB) --- note that this field is stored
-with source's endianness, as are all subsequent integer fields. Bits
+32-bit integer bit mask to denote important aspects of the file format. Bits
+are numbered from 0 (LSB) to 31 (MSB).  Note that
+this field is stored in network byte order (most significant byte first),
+as are all the integer fields used in the file format.  Bits
  16-31 are reserved to denote critical file format issues; a reader
  should abort if it finds an unexpected bit set in this range. Bits 0-15
  are reserved to signal backwards-compatible format issues; a reader
@@ -471,72 +465,28 @@ is left for a later release.
      Tuples
      
  Each tuple begins with a 16-bit integer count of the number of fields in the
-tuple.  (Presently, all tuples in a table will have the same count, but
-that might not always be true.)  Then, repeated for each field in the
-tuple, there is a 16-bit integer typlen word possibly followed by field data.
-The typlen field is interpreted thus:
-
-    
-     
-      Zero
-      
-       
-   Field is null.  No data follows.
-       
-      
-     
-
-     
-      > 0
-      
-       
-        Field is a fixed-length data type.  Exactly that many
-   bytes of data follow the typlen word.
-       
-      
-     
-
-     
-      -1
-      
-       
-   Field is a varlena data type.  The next four
-   bytes are the varlena header, which contains
-   the total value length including the header itself.
-       
-      
-     
-
-     
-      < -1
-      
-       
-   Reserved for future use.
-       
-      
-     
-    
+tuple.  (Presently, all tuples in a table will have the same count, but that
+might not always be true.)  Then, repeated for each field in the tuple, there
+is a 32-bit length word followed by that many bytes of field data.  (The
+length word does not include itself, and can be zero.)  As a special case,
+-1 indicates a NULL field value.  No value bytes follow in the NULL case.
      
  
      
-For nonnull fields, the reader can check that the typlen matches the
-expected typlen for the destination column.  This provides a simple
-but very useful check that the data is as expected.
+There is no alignment padding or any other extra data between fields.
      
  
      
-There is no alignment padding or any other extra data between fields.
-Note also that the format does not distinguish whether a data type is
-pass-by-reference or pass-by-value.  Both of these provisions are
-deliberate: they might help improve portability of the files (although
-of course endianness and floating-point-format issues can still keep
-you from moving a binary file across machines).
+Presently, all data values in a COPY BINARY file are
+assumed to be in binary format (format code one).  It is anticipated that a
+future extension may add a header field that allows per-column format codes
+to be specified.
      
  
      
  If OIDs are included in the file, the OID field immediately follows the
  field-count word.  It is a normal field except that it's not included
-in the field-count.  In particular it has a typlen --- this will allow
+in the field-count.  In particular it has a length word --- this will allow
  handling of 4-byte vs. 8-byte OIDs without too much pain, and will allow
  OIDs to be shown as null if that ever proves desirable.
      
@@ -546,8 +496,8 @@ OIDs to be shown as null if that ever proves desirable.
      File Trailer
  
      
-     The file trailer consists of an 16-bit integer word containing -1.  This is
-     easily distinguished from a tuple's field-count word.
+     The file trailer consists of a 16-bit integer word containing -1.  This
+     is easily distinguished from a tuple's field-count word.
      
  
      
@@ -579,19 +529,22 @@ COPY country FROM '/usr1/proj/bray/sql/country_data';
  
    
     Here is a sample of data suitable for copying into a table from
-   STDIN (so it must have the termination sequence on the
-   last line):
+   STDIN:
  
  AF      AFGHANISTAN
  AL      ALBANIA
  DZ      ALGERIA
  ZM      ZAMBIA
  ZW      ZIMBABWE
-\.
  
     Note that the white space on each line is actually a tab character.
    
  
+  
+   XXX the following example is OBSOLETE and needs to be updated for the
+   7.4 binary format:
+  
+
    
     The following is the same data, output in binary format on a
     Linux/i586 machine. The data is shown after filtering through the
author	Tom Lane
	Wed, 7 May 2003 22:23:27 +0000 (22:23 +0000)
committer	Tom Lane
	Wed, 7 May 2003 22:23:27 +0000 (22:23 +0000)