Boolean that specifies whether to insert SQL NULL for empty fields in an input file, which are represented by two successive delimiters (For example, ,,
).
If set to FALSE
, Snowflake attempts to cast an empty field to the corresponding column type. An empty string is inserted into columns of type STRING. For other column types, the
COPY INTO command produces an error.
- Default:
TRUE
SKIP_BYTE_ORDER_MARK = TRUE | FALSE
Boolean that specifies whether to skip the BOM (byte order mark), if present in a data file. A BOM is a character code at the beginning of a data file that defines the byte order and encoding form.
If set to FALSE
, Snowflake recognizes any BOM in data files, which could result in the BOM either causing an error or being merged into the first column in the table.
- Default:
TRUE
ENCODING = 'string'
String (constant) that specifies the character set of the source data.
Character Set |
ENCODING Value
|
Supported Languages |
Notes |
Big5 |
BIG5
|
Traditional Chinese |
|
EUC-JP |
EUCJP
|
Japanese |
|
EUC-KR |
EUCKR
|
Korean |
|
GB18030 |
GB18030
|
Chinese |
|
IBM420 |
IBM420
|
Arabic |
|
IBM424 |
IBM424
|
Hebrew |
|
IBM949 |
IBM949
|
Korean |
|
ISO-2022-CN |
ISO2022CN
|
Simplified Chinese |
|
ISO-2022-JP |
ISO2022JP
|
Japanese |
|
ISO-2022-KR |
ISO2022KR
|
Korean |
|
ISO-8859-1 |
ISO88591
|
Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Swedish |
|
ISO-8859-2 |
ISO88592
|
Czech, Hungarian, Polish, Romanian |
|
ISO-8859-5 |
ISO88595
|
Russian |
|
ISO-8859-6 |
ISO88596
|
Arabic |
|
ISO-8859-7 |
ISO88597
|
Greek |
|
ISO-8859-8 |
ISO88598
|
Hebrew |
|
ISO-8859-9 |
ISO88599
|
Turkish |
|
ISO-8859-15 |
ISO885915
|
Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Swedish |
Identical to ISO-8859-1 except for 8 characters, including the Euro currency symbol. |
KOI8-R |
KOI8R
|
Russian |
|
Shift_JIS |
SHIFTJIS
|
Japanese |
|
UTF-8 |
UTF8
|
All languages |
For loading data from delimited files (CSV, TSV, etc.), UTF-8 is the default. . . For loading data from all other supported file formats (JSON, Avro, etc.), as well as unloading data, UTF-8 is the only supported character set. |
UTF-16 |
UTF16
|
All languages |
|
UTF-16BE |
UTF16BE
|
All languages |
|
UTF-16LE |
UTF16LE
|
All languages |
|
UTF-32 |
UTF32
|
All languages |
|
UTF-32BE |
UTF32BE
|
All languages |
|
UTF-32LE |
UTF32LE
|
All languages |
|
windows-874 |
WINDOWS874
|
Thai |
|
windows-949 |
WINDOWS949
|
Korean |
|
windows-1250 |
WINDOWS1250
|
Czech, Hungarian, Polish, Romanian |
|
windows-1251 |
WINDOWS1251
|
Russian |
|
windows-1252 |
WINDOWS1252
|
Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Swedish |
|
windows-1253 |
WINDOWS1253
|
Greek |
|
windows-1254 |
WINDOWS1254
|
Turkish |
|
windows-1255 |
WINDOWS1255
|
Hebrew |
|
windows-1256 |
WINDOWS1256
|
Arabic |
|
- Default:
UTF8
Note
Snowflake stores all data internally in the UTF-8 character set. The data is converted into UTF-8 before it is loaded into Snowflake.
TYPE = JSON
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
String (constant) that specifies the current compression algorithm for the data files to be loaded. Snowflake uses this option to detect how already-compressed data files were compressed so that the
compressed data in the files can be extracted for loading.
Supported Values |
Notes |
AUTO
|
Compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. If loading Brotli-compressed files, explicitly use BROTLI instead of AUTO . |
GZIP
|
|
BZ2
|
|
BROTLI
|
|
ZSTD
|
|
DEFLATE
|
Deflate-compressed files (with zlib header, RFC1950). |
RAW_DEFLATE
|
Raw Deflate-compressed files (without header, RFC1951). |
NONE
|
Indicates the files for loading data have not been compressed. |
- Default:
AUTO
DATE_FORMAT = 'string' | AUTO
Defines the format of date string values in the data files. If a value is not specified or is AUTO
, the value for the DATE_INPUT_FORMAT parameter is used.
This file format option is applied to the following actions only:
Loading JSON data into separate columns using the MATCH_BY_COLUMN_NAME copy option.
Loading JSON data into separate columns by specifying a query in the COPY statement (that is, COPY transformation).
- Default:
AUTO
TIME_FORMAT = 'string' | AUTO
Defines the format of time string values in the data files. If a value is not specified or is AUTO
, the value for the TIME_INPUT_FORMAT parameter is used.
This file format option is applied to the following actions only:
Loading JSON data into separate columns using the MATCH_BY_COLUMN_NAME copy option.
Loading JSON data into separate columns by specifying a query in the COPY statement (that is, COPY transformation).
- Default:
AUTO
TIMESTAMP_FORMAT = string' | AUTO
Defines the format of timestamp string values in the data files. If a value is not specified or is AUTO
, the value for the TIMESTAMP_INPUT_FORMAT parameter is used.
This file format option is applied to the following actions only:
Loading JSON data into separate columns using the MATCH_BY_COLUMN_NAME copy option.
Loading JSON data into separate columns by specifying a query in the COPY statement (that is, COPY transformation).
- Default:
AUTO
BINARY_FORMAT = HEX | BASE64 | UTF8
Defines the encoding format for binary string values in the data files. The option can be used when loading data into binary columns in a table.
This file format option is applied to the following actions only:
Loading JSON data into separate columns using the MATCH_BY_COLUMN_NAME copy option.
Loading JSON data into separate columns by specifying a query in the COPY statement (that is, COPY transformation).
- Default:
HEX
TRIM_SPACE = TRUE | FALSE
Boolean that specifies whether to remove leading and trailing white space from strings.
For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (that is, the quotation marks are interpreted as part of the string of field data). Set this option to TRUE
to remove undesirable spaces during the data load.
This file format option is applied to the following actions only when loading JSON data into separate columns using the
MATCH_BY_COLUMN_NAME copy option.
- Default:
FALSE
MULTI_LINE = TRUE | FALSE
Boolean that specifies whether multiple lines are allowed.
If MULTI_LINE is set to FALSE
and a new line is present within a JSON record, the record containing the new line will be interpreted as an error.
- Default:
TRUE
NULL_IF = ( 'string1' [ , 'string2' , ... ] )
String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than
one string, enclose the list of strings in parentheses and use commas to separate each value.
This file format option is applied to the following actions only when loading JSON data into separate columns using the
MATCH_BY_COLUMN_NAME copy option.
Note that Snowflake converts all instances of the value to NULL, regardless of the data type. For example, if 2
is specified as a
value, all instances of 2
as either a string or number are converted.
For example:
NULL_IF = ('\N', 'NULL', 'NUL', '')
Note that this option can include empty strings.
- Default:
\\N
(that is, NULL)
ENABLE_OCTAL = TRUE | FALSE
Boolean that enables parsing of octal numbers.
- Default:
FALSE
ALLOW_DUPLICATE = TRUE | FALSE
Boolean that allows duplicate object field names (only the last one will be preserved).
- Default:
FALSE
STRIP_OUTER_ARRAY = TRUE | FALSE
Boolean that instructs the JSON parser to remove outer brackets [ ]
.
- Default:
FALSE
STRIP_NULL_VALUES = TRUE | FALSE
Boolean that instructs the JSON parser to remove object fields or array elements containing null
values. For example, when set to TRUE
:
Before |
After |
[null]
|
[]
|
[null,null,3]
|
[,,3]
|
{"a":null,"b":null,"c":123}
|
{"c":123}
|
{"a":[1,null,2],"b":{"x":null,"y":88}}
|
{"a":[1,,2],"b":{"y":88}}
|
- Default:
FALSE
REPLACE_INVALID_CHARACTERS = TRUE | FALSE
Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character (�
). The copy
option performs a one-to-one character replacement.
If set to TRUE
, Snowflake replaces invalid UTF-8 characters with the Unicode replacement character.
If set to FALSE
, the load operation produces an error when invalid UTF-8 character encoding is detected.
- Default:
FALSE
IGNORE_UTF8_ERRORS = TRUE | FALSE
Boolean that specifies whether UTF-8 encoding errors produce error conditions. It is an alternative syntax for REPLACE_INVALID_CHARACTERS
.
If set to TRUE
, any invalid UTF-8 sequences are silently replaced with the Unicode character U+FFFD
(that is, “replacement character”).
If set to FALSE
, the load operation produces an error when invalid UTF-8 character encoding is detected.
- Default:
FALSE
SKIP_BYTE_ORDER_MARK = TRUE | FALSE
Boolean that specifies whether to skip any BOM (byte order mark) present in an input file. A BOM is a character code at the beginning of a data file that defines the byte order and encoding form.
If set to FALSE
, Snowflake recognizes any BOM in data files, which could result in the BOM either causing an error or being merged into the first column in the table.
- Default:
TRUE
TYPE = AVRO
COMPRESSION = AUTO | GZIP | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
String (constant) that specifies the current compression algorithm for the data files to be loaded. Snowflake uses this option to detect how already-compressed data files were compressed so that the
compressed data in the files can be extracted for loading.
Supported Values |
Notes |
AUTO
|
Compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. If loading Brotli-compressed files, explicitly use BROTLI instead of AUTO . |
GZIP
|
|
BROTLI
|
|
ZSTD
|
|
DEFLATE
|
Deflate-compressed files (with zlib header, RFC1950). |
RAW_DEFLATE
|
Raw Deflate-compressed files (without header, RFC1951). |
NONE
|
Data files to load have not been compressed. |
- Default:
AUTO
.
Note
We recommend that you use the default AUTO
option because it will determine both the file and codec compression. Specifying a compression option refers to the compression of files, not the compression of blocks (codecs).
TRIM_SPACE = TRUE | FALSE
Boolean that specifies whether to remove leading and trailing white space from strings.
For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (that is, the quotation marks are interpreted as part of the string of field data). Set this option to TRUE
to remove undesirable spaces during the data load.
This file format option is applied to the following actions only when loading Avro data into separate columns using the
MATCH_BY_COLUMN_NAME copy option.
- Default:
FALSE
REPLACE_INVALID_CHARACTERS = TRUE | FALSE
Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character (�
). The copy
option performs a one-to-one character replacement.
If set to TRUE
, Snowflake replaces invalid UTF-8 characters with the Unicode replacement character.
If set to FALSE
, the load operation produces an error when invalid UTF-8 character encoding is detected.
- Default:
FALSE
NULL_IF = ( 'string1' [ , 'string2' , ... ] )
String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than
one string, enclose the list of strings in parentheses and use commas to separate each value.
This file format option is applied to the following actions only when loading Avro data into separate columns using the
MATCH_BY_COLUMN_NAME copy option.
Note that Snowflake converts all instances of the value to NULL, regardless of the data type. For example, if 2
is specified as a
value, all instances of 2
as either a string or number are converted.
For example:
NULL_IF = ('\N', 'NULL', 'NUL', '')
Note that this option can include empty strings.
- Default:
\\N
(that is, NULL)
TYPE = ORC
TRIM_SPACE = TRUE | FALSE
Boolean that specifies whether to remove leading and trailing white space from strings.
For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space rather than the opening quotation character as the beginning of the field (that is, the quotation marks are interpreted as part of the string of field data). Set this option to TRUE
to remove undesirable spaces during the data load.
This file format option is applied to the following actions only when loading Orc data into separate columns using the
MATCH_BY_COLUMN_NAME copy option.
- Default:
FALSE
REPLACE_INVALID_CHARACTERS = TRUE | FALSE
Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character (�
). The copy
option performs a one-to-one character replacement.
If set to TRUE
, Snowflake replaces invalid UTF-8 characters with the Unicode replacement character.
If set to FALSE
, the load operation produces an error when invalid UTF-8 character encoding is detected.
- Default:
FALSE
NULL_IF = ( 'string1' [ , 'string2' , ... ] )
String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than
one string, enclose the list of strings in parentheses and use commas to separate each value.
This file format option is applied to the following actions only when loading Orc data into separate columns using the
MATCH_BY_COLUMN_NAME copy option.
Note that Snowflake converts all instances of the value to NULL, regardless of the data type. For example, if 2
is specified as a
value, all instances of 2
as either a string or number are converted.
For example:
NULL_IF = ('\N', 'NULL', 'NUL', '')
Note that this option can include empty strings.
- Default:
\\N
(that is, NULL)
TYPE = PARQUET
COMPRESSION = AUTO | SNAPPY | NONE
String (constant) that specifies the current compression algorithm for the data files to be loaded. Snowflake uses this option to detect how already-compressed data files were compressed so that the
compressed data in the files can be extracted for loading.
Supported Values |
Notes |
AUTO
|
Compression algorithm detected automatically. Supports the following compression algorithms: Brotli, gzip, Lempel-Ziv-Oberhumer (LZO), LZ4, Snappy, or Zstandard v0.8 (and higher). |
SNAPPY
|
|
NONE
|
Data files to load have not been compressed. |
- Default:
AUTO
BINARY_AS_TEXT = TRUE | FALSE
Boolean that specifies whether to interpret columns with no defined logical data type as UTF-8 text. When set to FALSE
, Snowflake interprets these columns as binary data.
- Default:
TRUE
Note
Snowflake recommends that you set BINARY_AS_TEXT to FALSE to avoid any potential conversion issues.
TRIM_SPACE = TRUE | FALSE
Boolean that specifies whether to remove leading and trailing white space from strings.
For example, if your external database software encloses fields in quotes, but inserts a leading space, Snowflake reads the leading space
rather than the opening quotation character as the beginning of the field (that is, the quotation marks are interpreted as part of the string
of field data). Set this option to TRUE
to remove undesirable spaces during the data load.
This file format option is applied to the following actions only when loading Parquet data into separate columns using the
MATCH_BY_COLUMN_NAME copy option.
- Default:
FALSE
USE_LOGICAL_TYPE = TRUE | FALSE
Boolean that specifies whether to use Parquet logical types. With this file format option, Snowflake can interpret Parquet logical types during data loading. For more information, see Parquet Logical Type Definitions. To enable Parquet logical types, set USE_LOGICAL_TYPE as TRUE when you create a new file format option.
- Default:
FALSE
USE_VECTORIZED_SCANNER = TRUE | FALSE
Boolean that specifies whether to use a vectorized scanner for loading Parquet files.
The default value is FALSE
. In a future BCR, the default value will be TRUE
. We recommend that you set USE_VECTORIZED_SCANNER = TRUE
for new workloads, and set it for existing workloads after testing.
Using the vectorized scanner can significantly reduce the latency for loading Parquet files, because this scanner is well suited for the columnar format of a Parquet file. The scanner only downloads relevant sections of the Parquet file into memory, such as the subset of selected columns.
If USE_VECTORIZED_SCANNER
is set to TRUE
, the vectorized scanner has the following behaviors:
The BINARY_AS_TEXT
option is always treated as FALSE
and the USE_LOGICAL_TYPE
option is always treated as TRUE
, no matter what the actual value is being set to.
The vectorized scanner supports Parquet map types. The output of scanning a map type is as follows:
"my_map":
{
"k1": "v1",
"k2": "v2"
}
The vectorized scanner shows NULL
values in the output, as the following example demonstrates:
"person":
{
"name": "Adam",
"nickname": null,
"age": 34,
"phone_numbers":
[
"1234567890",
"0987654321",
null,
"6781234590"
]
}
The vectorized scanner handles Time and Timestamp as follows:
Parquet |
Snowflake vectorized scanner |
TimeType(isAdjustedToUtc=True/False, unit=MILLIS/MICROS/NANOS) |
TIME |
TimestampType(isAdjustedToUtc=True, unit=MILLIS/MICROS/NANOS) |
TIMESTAMP_LTZ |
TimestampType(isAdjustedToUtc=False, unit=MILLIS/MICROS/NANOS) |
TIMESTAMP_NTZ |
INT96 |
TIMESTAMP_LTZ |
If USE_VECTORIZED_SCANNER
is set to FALSE
, the scanner has the following behaviors:
This option does not support Parquet maps. The output of scanning a map type is as follows:
"my_map":
{
"key_value":
[
{
"key": "k1",
"value": "v1"
},
{
"key": "k2",
"value": "v2"
}
]
}
This option does not explicitly show NULL
values in the scan output, as the following example demonstrates:
"person":
{
"name": "Adam",
"age": 34
"phone_numbers":
[
"1234567890",
"0987654321",
"6781234590"
]
}
This option handles Time and Timestamp as follows:
Parquet |
When USE_LOGICAL_TYPE = TRUE |
When USE_LOGICAL_TYPE = FALSE |
TimeType(isAdjustedToUtc=True/False, unit=MILLIS/MICROS) |
TIME |
|
TimeType(isAdjustedToUtc=True/False, unit=NANOS) |
TIME |
INTEGER |
TimestampType(isAdjustedToUtc=True, unit=MILLIS/MICROS) |
TIMESTAMP_LTZ |
TIMESTAMP_NTZ |
TimestampType(isAdjustedToUtc=True, unit=NANOS) |
TIMESTAMP_LTZ |
INTEGER |
TimestampType(isAdjustedToUtc=False, unit=MILLIS/MICROS) |
TIMESTAMP_NTZ |
|
TimestampType(isAdjustedToUtc=False, unit=NANOS) |
TIMESTAMP_NTZ |
INTEGER |
INT96 |
TIMESTAMP_NTZ |
TIMESTAMP_NTZ |
REPLACE_INVALID_CHARACTERS = TRUE | FALSE
Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character (�
). The copy
option performs a one-to-one character replacement.
If set to TRUE
, Snowflake replaces invalid UTF-8 characters with the Unicode replacement character.
If set to FALSE
, the load operation produces an error when invalid UTF-8 character encoding is detected.
- Default:
FALSE
NULL_IF = ( 'string1' [ , 'string2' , ... ] )
String used to convert to and from SQL NULL. Snowflake replaces these strings in the data load source with SQL NULL. To specify more than
one string, enclose the list of strings in parentheses and use commas to separate each value.
This file format option is applied to the following actions only when loading Parquet data into separate columns using the
MATCH_BY_COLUMN_NAME copy option.
Note that Snowflake converts all instances of the value to NULL, regardless of the data type. For example, if 2
is specified as a
value, all instances of 2
as either a string or number are converted.
For example:
NULL_IF = ('\N', 'NULL', 'NUL', '')
Note that this option can include empty strings.
- Default:
\N
(that is, NULL)
TYPE = XML
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | NONE
String (constant) that specifies the current compression algorithm for the data files to be loaded. Snowflake uses this option to detect how already-compressed data files were compressed so that the
compressed data in the files can be extracted for loading.
Supported Values |
Notes |
AUTO
|
Compression algorithm detected automatically, except for Brotli-compressed files, which cannot currently be detected automatically. If loading Brotli-compressed files, explicitly use BROTLI instead of AUTO . |
GZIP
|
|
BZ2
|
|
BROTLI
|
|
ZSTD
|
|
DEFLATE
|
Deflate-compressed files (with zlib header, RFC1950). |
RAW_DEFLATE
|
Raw Deflate-compressed files (without header, RFC1951). |
NONE
|
Data files to load have not been compressed. |
- Default:
AUTO
IGNORE_UTF8_ERRORS = TRUE | FALSE
Boolean that specifies whether UTF-8 encoding errors produce error conditions. It is an alternative syntax for REPLACE_INVALID_CHARACTERS
.
If set to TRUE
, any invalid UTF-8 sequences are silently replaced with the Unicode character U+FFFD
(that is, “replacement character”).
If set to FALSE
, the load operation produces an error when invalid UTF-8 character encoding is detected.
- Default:
FALSE
PRESERVE_SPACE = TRUE | FALSE
Boolean that specifies whether the XML parser preserves leading and trailing spaces in element content.
- Default:
FALSE
STRIP_OUTER_ELEMENT = TRUE | FALSE
Boolean that specifies whether the XML parser strips out the outer XML element, exposing 2nd level elements as separate documents.
- Default:
FALSE
DISABLE_AUTO_CONVERT = TRUE | FALSE
Boolean that specifies whether the XML parser disables automatic conversion of numeric and Boolean values from text to native representation.
- Default:
FALSE
REPLACE_INVALID_CHARACTERS = TRUE | FALSE
Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character (�
). The copy
option performs a one-to-one character replacement.
If set to TRUE
, Snowflake replaces invalid UTF-8 characters with the Unicode replacement character.
If set to FALSE
, the load operation produces an error when invalid UTF-8 character encoding is detected.
- Default:
FALSE
SKIP_BYTE_ORDER_MARK = TRUE | FALSE
Boolean that specifies whether to skip any BOM (byte order mark) present in an input file. A BOM is a character code at the beginning of a data file that defines the byte order and encoding form.
If set to FALSE
, Snowflake recognizes any BOM in data files, which could result in the BOM either causing an error or being merged into the first column in the table.
- Default:
TRUE
Copy options (copyOptions
)
You can specify one or more of the following copy options (separated by blank spaces, commas, or new lines):
ON_ERROR = CONTINUE | SKIP_FILE | SKIP_FILE_num | 'SKIP_FILE_num%' | ABORT_STATEMENT
- Use:
Data loading only
- Definition:
String (constant) that specifies the error handling for the load operation.
Important
Carefully consider the ON_ERROR copy option value. The default value is appropriate in common scenarios, but is not always the best
option.
- Values:
CONTINUE
Continue to load the file if errors are found. The COPY statement returns an error message for a maximum of one error found per data file.
Note that the difference between the ROWS_PARSED and ROWS_LOADED column values represents the number of rows that include detected errors. However, each of these rows could include multiple errors. To view all errors in the data files, use the VALIDATION_MODE parameter or query the VALIDATE function.
SKIP_FILE
Skip a file when an error is found.
Note that the SKIP_FILE
action buffers an entire file whether errors are found or not. For this reason, SKIP_FILE
is slower than either CONTINUE
or ABORT_STATEMENT
. Skipping large files due to a small number of errors could result in delays and wasted credits. When loading large numbers of records from files that have no logical delineation (e.g. the files were generated automatically at rough intervals), consider specifying CONTINUE
instead.
Additional patterns:
SKIP_FILE_num
(e.g. SKIP_FILE_10
)Skip a file when the number of error rows found in the file is equal to or exceeds the specified number.
'SKIP_FILE_num%'
(e.g. 'SKIP_FILE_10%'
)Skip a file when the percentage of error rows found in the file exceeds the specified percentage.
ABORT_STATEMENT
Abort the load operation if any error is found in a data file.
The load operation is aborted only when the data files that were explicitly specified in the FILES
parameter cannot be found. Otherwise, the load operation is not aborted if the data file cannot be found (for example, because it does not exist or cannot be accessed).
Note that the aborted operations do not show up in COPY_HISTORY as the data files were not ingested. We recommend that you search for the failures in QUERY_HISTORY.
- Default:
- Bulk loading using COPY:
ABORT_STATEMENT
- Snowpipe:
SKIP_FILE
SIZE_LIMIT = num
- Definition:
Number (> 0) that specifies the maximum size (in bytes) of data to be loaded for a given COPY statement. When the threshold is exceeded, the COPY operation discontinues loading files. This option is commonly used to load a common group of files using multiple COPY statements. For each statement, the data load continues until the specified SIZE_LIMIT
is exceeded, before moving on to the next statement.
For example, suppose a set of files in a stage path were each 10 MB in size. If multiple COPY statements set SIZE_LIMIT to 25000000
(25 MB), each would load 3 files. That is, each COPY operation would discontinue after the SIZE_LIMIT
threshold was exceeded.
Note that at least one file is loaded regardless of the value specified for SIZE_LIMIT
unless there is no file to be loaded.
- Default:
null (no size limit)
PURGE = TRUE | FALSE
- Definition:
Boolean that specifies whether to remove the data files from the stage automatically after the data is loaded successfully.
If this option is set to TRUE
, note that a best effort is made to remove successfully loaded data files. If the purge operation fails for any reason, no error is returned currently. We recommend that you list staged files periodically (using LIST) and manually remove successfully loaded files, if any exist.
- Default:
FALSE
RETURN_FAILED_ONLY = TRUE | FALSE
- Definition:
Boolean that specifies whether to return only files that have failed to load in the statement result.
- Default:
FALSE
MATCH_BY_COLUMN_NAME = CASE_SENSITIVE | CASE_INSENSITIVE | NONE
- Definition:
String that specifies whether to load semi-structured data into columns in the target table that match corresponding columns represented in the data.
Important
Do not use the MATCH_BY_COLUMN_NAME copy option with a SELECT statement for transforming data during a load in all cases. These two options can still be used separately, but cannot be used together. Any attempt to do so will result in the following error: SQL compilation error: match_by_column_name is not supported with copy transform.
.
For example, the following syntax is not allowed:
COPY INTO [<namespace>.]<table_name> [ ( <col_name> [ , <col_name> ... ] ) ]
FROM ( SELECT [<alias>.]$<file_col_num>[.<element>] [ , [<alias>.]$<file_col_num>[.<element>] ... ]
FROM { internalStage | externalStage } )
[ FILES = ( '' [ , '' ] [ , ... ] ) ]
[ PATTERN = '' ]
[ FILE_FORMAT = ( { FORMAT_NAME = '[.]' |
TYPE = { CSV | JSON | AVRO | ORC | PARQUET | XML } [ formatTypeOptions ] } ) ]
MATCH_BY_COLUMN_NAME = CASE_SENSITIVE | CASE_INSENSITIVE | NONE
[ other copyOptions ]
For more information, see Transforming Data During a Load.
This copy option is supported for the following data formats:
JSON
Avro
ORC
Parquet
CSV
For a column to match, the following criteria must be true:
The column represented in the data must have the exact same name as the column in the table. The copy option supports case sensitivity for column names. Column order does not matter.
The column in the table must have a data type that is compatible with the values in the column represented in the data. For example, string, number, and Boolean values can all be loaded into a variant column.
- Values:
CASE_SENSITIVE
| CASE_INSENSITIVE
Load semi-structured data into columns in the target table that match corresponding columns represented in the data. Column names are either case-sensitive (CASE_SENSITIVE
) or case-insensitive (CASE_INSENSITIVE
).
The COPY operation verifies that at least one column in the target table matches a column represented in the data files. If a match is found, the values in the data files are loaded into the column or columns. If no match is found, a set of NULL values for each record in the files is loaded into the table.
Note
If additional non-matching columns are present in the data files, the values in these columns are not loaded.
If additional non-matching columns are present in the target table, the COPY operation inserts NULL values into these columns. These columns must support NULL values.
NONE
The COPY operation loads the semi-structured data into a variant column or, if a query is included in the COPY statement, transforms the data.
- Default:
NONE
Note
The following limitations currently apply:
MATCH_BY_COLUMN_NAME cannot be used with the VALIDATION_MODE
parameter in a COPY statement to validate the staged data rather than load it into the target table.
Parquet data only. When MATCH_BY_COLUMN_NAME is set to CASE_SENSITIVE
or CASE_INSENSITIVE
, an empty column value (e.g. "col1": ""
) produces an error.
INCLUDE_METADATA = ( column_name = METADATA$field [ , column_name = METADATA$field ... ] )
- Definition:
A user-defined mapping between a target table’s existing columns to its METADATA$ columns. This copy option can only be used with the MATCH_BY_COLUMN_NAME copy option. The valid input for METADATA$field
includes the following:
For more information about metadata columns, see Querying Metadata for Staged Files.
When a mapping is defined with this copy option, the column column_name
is populated with the specified metadata value, as the following example demonstrates:
COPY INTO table1 FROM @stage1
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE
INCLUDE_METADATA = (
ingestdate = METADATA$START_SCAN_TIME, filename = METADATA$FILENAME);
+-----+-----------------------+---------------------------------+-----+
| ... | FILENAME | INGESTDATE | ... |
|---------------------------------------------------------------+-----|
| ... | example_file.json.gz | Thu, 22 Feb 2024 19:14:55 +0000 | ... |
+-----+-----------------------+---------------------------------+-----+
- Default:
NULL
Note
The INCLUDE_METADATA
target column name must first exist in the table. The target column name is not automatically added if it does not exist.
Use a unique column name for the INCLUDE_METADATA
columns. If the INCLUDE_METADATA
target column has a name conflict with a column in the data file, the METADATA$
value defined by INCLUDE_METADATA
takes precedence.
When loading CSV with INCLUDE_METADATA
, set the file format option ERROR_ON_COLUMN_COUNT_MISMATCH
to FALSE
.
ENFORCE_LENGTH = TRUE | FALSE
- Definition:
Alternative syntax for TRUNCATECOLUMNS
with reverse logic (for compatibility with other systems)
Boolean that specifies whether to truncate text strings that exceed the target column length:
If TRUE
, the COPY statement produces an error if a loaded string exceeds the target column length.
If FALSE
, strings are automatically truncated to the target column length.
This copy option supports CSV data, as well as string values in semi-structured data when loaded into separate columns in relational tables.
- Default:
TRUE
Note
If the length of the target string column is set to the maximum (e.g. VARCHAR (16777216)
), an incoming string cannot exceed this length; otherwise, the COPY command produces an error.
This parameter is functionally equivalent to TRUNCATECOLUMNS
, but has the opposite behavior. It is provided for compatibility with other databases. It is only necessary to include one of these two
parameters in a COPY statement to produce the desired output.
TRUNCATECOLUMNS = TRUE | FALSE
- Definition:
Alternative syntax for ENFORCE_LENGTH
with reverse logic (for compatibility with other systems)
Boolean that specifies whether to truncate text strings that exceed the target column length:
If TRUE
, strings are automatically truncated to the target column length.
If FALSE
, the COPY statement produces an error if a loaded string exceeds the target column length.
This copy option supports CSV data, as well as string values in semi-structured data when loaded into separate columns in relational tables.
- Default:
FALSE
Note
If the length of the target string column is set to the maximum (e.g. VARCHAR (16777216)
), an incoming string cannot exceed this length; otherwise, the COPY command produces an error.
This parameter is functionally equivalent to ENFORCE_LENGTH
, but has the opposite behavior. It is provided for compatibility with other databases. It is only necessary to include one of these two
parameters in a COPY statement to produce the desired output.
FORCE = TRUE | FALSE
- Definition:
Boolean that specifies to load all files, regardless of whether they’ve been loaded previously and have not changed since they were loaded. Note that this option reloads files, potentially duplicating data in a table.
- Default:
FALSE
LOAD_UNCERTAIN_FILES = TRUE | FALSE
- Definition:
Boolean that specifies to load files for which the load status is unknown. The COPY command skips these files by default.
The load status is unknown if all of the following conditions are true:
The file’s LAST_MODIFIED date (i.e. date when the file was staged) is older than 64 days.
The initial set of data was loaded into the table more than 64 days earlier.
If the file was already loaded successfully into the table, this event occurred more than 64 days earlier.
To force the COPY command to load all files regardless of whether the load status is known, use the FORCE
option instead.
For more information about load status uncertainty, see Loading older files.
- Default:
FALSE
FILE_PROCESSOR = (SCANNER = custom_scanner_type SCANNER_OPTIONS = (scanner_options))
- Definition:
Specifies the scanner and the scanner options used for processing unstructured data.
SCANNER
(Required): specifies the type of custom scanner used to process unstructured data. Currently, only the document_ai
custom scanner type is supported.
SCANNER_OPTIONS
: specifies the properties to the custom scanner type. For example, if you specify document_ai
as the type of SCANNER
, you must specify the properties of document_ai
. The predefined set of properties for document_ai
are:
project_name
: the name of the project where you create the Document AI model.
model_name
(Required for document_ai
): the name of the Document AI model.
model_version
(Required for document_ai
): the version of the Document AI model.
For more information, see Loading unstructured data with Document AI.
Note
This copy option does not work with MATCH_BY_COLUMN_NAME
.
LOAD_MODE = { FULL_INGEST | ADD_FILES_COPY }
- Definition:
Specifies the mode to use when loading data from Parquet files into a Snowflake-managed Iceberg table.
FULL_INGEST
: Snowflake scans the files and rewrites the Parquet data under the base location of the Iceberg table.
Use this option if you need to transform or convert the data before registering the files to your Iceberg table.
ADD_FILES_COPY
: Snowflake performs a server-side copy of the original Parquet files into the base location of the Iceberg table,
then registers the files to the table. This allows for cross-region or cross-cloud ingestion of raw Parquet files into Iceberg tables.
Note
The ADD_FILES_COPY
option is only supported for loading data from Iceberg-compatible raw Parquet files without transformation.
A raw Iceberg-compatible Parquet file is not registered with an Iceberg catalog, but contains Iceberg compatible data types.
Use this option to avoid file read overhead. To minimize storage costs, use PURGE = TRUE
with this option.
Doing so tells Snowflake to automatically remove the data files from the original location after the data is loaded successfully.
For additional usage notes, see the LOAD_MODE usage notes.
For examples, see Loading Iceberg-compatible Parquet data into an Iceberg table.
- Default:
FULL_INGEST
Usage notes
Some use cases are not fully supported and can lead to inconsistent or unexpected ON_ERROR behavior, including the
following use cases:
When you load CSV data, if a stream is on the target table, the ON_ERROR copy option might not work as expected.
Loading from Google Cloud Storage only: The list of objects returned for an external stage might include one or more “directory blobs”;
essentially, paths that end in a forward slash character (/
), e.g.:
LIST @my_gcs_stage;
+---------------------------------------+------+----------------------------------+-------------------------------+
| name | size | md5 | last_modified |
|---------------------------------------+------+----------------------------------+-------------------------------|
| my_gcs_stage/load/ | 12 | 12348f18bcb35e7b6b628ca12345678c | Mon, 11 Sep 2019 16:57:43 GMT |
| my_gcs_stage/load/data_0_0_0.csv.gz | 147 | 9765daba007a643bdff4eae10d43218y | Mon, 11 Sep 2019 18:13:07 GMT |
+---------------------------------------+------+----------------------------------+-------------------------------+
These blobs are listed when directories are created in the Google Cloud console rather than using any other tool provided by Google.
COPY statements that reference a stage can fail when the object list includes directory blobs. To avoid errors, we recommend using file
pattern matching to identify the files for inclusion (i.e. the PATTERN clause) when the file list for a stage includes directory blobs. For
an example, see Loading Using Pattern Matching (in this topic). Alternatively, set ON_ERROR = SKIP_FILE in the COPY statement.
STORAGE_INTEGRATION
, CREDENTIALS
, and ENCRYPTION
only apply if you are loading directly from a private/protected
storage location:
If you are loading from a public bucket, secure access is not required.
If you are loading from a named external stage, the stage provides all the credential information required for accessing the bucket.
If you encounter errors while running the COPY command, after the command completes, you can validate the files that produced the errors
using the VALIDATE table function.
Note
The VALIDATE function only returns output for COPY commands used to perform standard data loading; it does not support COPY commands that
perform transformations during data loading (e.g. loading a subset of data columns or reordering data columns).
Unless you explicitly specify FORCE = TRUE
as one of the copy options, the command ignores staged data files that were already
loaded into the table. To reload the data, you must either specify FORCE = TRUE
or modify the file and stage it again, which
generates a new checksum.
The COPY command does not validate data type conversions for Parquet files.
For information about loading hybrid tables, see Loading data.
Loading from Iceberg-compatible Parquet files using LOAD_MODE
:
You must fulfill the following prerequisites when using the LOAD_MODE = ADD_FILES_COPY
option:
The target table must be a Snowflake-managed Iceberg table with column data types that are compatible with the source Parquet file data types.
For more information, see Data types for Apache Iceberg™ tables.
The source file format type must be Iceberg-compatible Parquet, and you must use a vectorized scanner: FILE_FORMAT = ( TYPE = PARQUET USE_VECTORIZED_SCANNER = TRUE)
.
You must set the MATCH_BY_COLUMN_NAME
option to CASE_SENSITIVE
.
The following options aren’t supported when you use LOAD_MODE = ADD_FILES_COPY
:
Copying unstaged data by specifying a cloud storage location and a storage integration.
Any file format configuration other than FILE_FORMAT = ( TYPE = PARQUET USE_VECTORIZED_SCANNER = TRUE)
.
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE | NONE
.
ON_ERROR = CONTINUE | SKIP_FILE_N | SKIP_FILE_X%
.
VALIDATION_MODE
.
Transforming or filtering the data before loading. To transform the data, use the FULL_INGEST option instead.
Using Snowpipe to load data.
For ADD_FILES_COPY
, using a larger warehouse does not significantly decrease the duration of the COPY query. The
majority of the COPY operation relies on Cloud Services compute resources.
To run this command with an external stage that uses a storage integration,
you must use a role that has or inherits the USAGE privilege on the storage integration.
For more information, see Stage privileges.
For outbound private connectivity, loading directly from an external location (external
storage URI) isn’t supported. Instead, use an external stage with a storage integration configured for outbound private connectivity.
Output
The command returns the following columns:
Column Name |
Data Type |
Description |
FILE |
TEXT |
Name of source file and relative path to the file |
STATUS |
TEXT |
Status: loaded, load failed or partially loaded |
ROWS_PARSED |
NUMBER |
Number of rows parsed from the source file |
ROWS_LOADED |
NUMBER |
Number of rows loaded from the source file |
ERROR_LIMIT |
NUMBER |
If the number of errors reaches this limit, then abort |
ERRORS_SEEN |
NUMBER |
Number of error rows in the source file |
FIRST_ERROR |
TEXT |
First error of the source file |
FIRST_ERROR_LINE |
NUMBER |
Line number of the first error |
FIRST_ERROR_CHARACTER |
NUMBER |
Position of the first error character |
FIRST_ERROR_COLUMN_NAME |
TEXT |
Column name of the first error |
Examples
For examples of data loading transformations, see Transforming data during a load.
Loading files from an internal stage
Note
These examples assume the files were copied to the stage earlier using the PUT command.
Load files from a named internal stage into a table:
COPY INTO mytable
FROM @my_int_stage;
Load files from a table’s stage into the table:
COPY INTO mytable
FILE_FORMAT = (TYPE = CSV);
Note
When copying data from files in a table location, the FROM clause can be omitted because Snowflake automatically checks for files in the
table’s location.
Load files from the user’s personal stage into a table:
COPY INTO mytable from @~/staged
FILE_FORMAT = (FORMAT_NAME = 'mycsv');
Loading files from a named external stage
Load files from a named external stage that you created previously using the CREATE STAGE command. The named
external stage references an external location (Amazon S3, Google Cloud Storage, or Microsoft Azure) and includes all the credentials and
other details required for accessing the location:
COPY INTO mycsvtable
FROM @my_ext_stage/tutorials/dataloading/contacts1.csv;
Loading files using column matching
Load files from a named external stage into the table with the MATCH_BY_COLUMN_NAME
copy option, by case-insensitive matching the column names in the files to the column names defined in the table. With this option, the column ordering of the file does not need to match the column ordering of the table.
COPY INTO mytable
FROM @my_ext_stage/tutorials/dataloading/sales.json.gz
FILE_FORMAT = (TYPE = 'JSON')
MATCH_BY_COLUMN_NAME='CASE_INSENSITIVE';
Loading files directly from an external location
The following example loads all files prefixed with data/files
from a storage location (Amazon S3, Google Cloud Storage, or
Microsoft Azure) using a named my_csv_format
file format:
Amazon S3
Access the referenced S3 bucket using a referenced storage integration named myint
. Note that both examples truncate the
MASTER_KEY
value:
COPY INTO mytable
FROM s3://mybucket/data/files
STORAGE_INTEGRATION = myint
ENCRYPTION=(MASTER_KEY = 'eSx...')
FILE_FORMAT = (FORMAT_NAME = my_csv_format);
Access the referenced S3 bucket using supplied credentials:
COPY INTO mytable
FROM s3://mybucket/data/files
CREDENTIALS=(AWS_KEY_ID='$AWS_ACCESS_KEY_ID' AWS_SECRET_KEY='$AWS_SECRET_ACCESS_KEY')
ENCRYPTION=(MASTER_KEY = 'eSx...')
FILE_FORMAT = (FORMAT_NAME = my_csv_format);
Google Cloud Storage
Access the referenced GCS bucket using a referenced storage integration named myint
:
COPY INTO mytable
FROM 'gcs://mybucket/data/files'
STORAGE_INTEGRATION = myint
FILE_FORMAT = (FORMAT_NAME = my_csv_format);
Microsoft Azure
Access the referenced container using a referenced storage integration named myint
. Note that both examples truncate the
MASTER_KEY
value:
COPY INTO mytable
FROM 'azure://myaccount.blob.core.windows.net/data/files'
STORAGE_INTEGRATION = myint
ENCRYPTION=(TYPE='AZURE_CSE' MASTER_KEY = 'kPx...')
FILE_FORMAT = (FORMAT_NAME = my_csv_format);
Access the referenced container using supplied credentials:
COPY INTO mytable
FROM 'azure://myaccount.blob.core.windows.net/mycontainer/data/files'
CREDENTIALS=(AZURE_SAS_TOKEN='?sv=2016-05-31&ss=b&srt=sco&sp=rwdl&se=2018-06-27T10:05:50Z&st=2017-06-27T02:05:50Z&spr=https,http&sig=bgqQwoXwxzuD2GJfagRg7VOS8hzNr3QLT7rhS8OFRLQ%3D')
ENCRYPTION=(TYPE='AZURE_CSE' MASTER_KEY = 'kPx...')
FILE_FORMAT = (FORMAT_NAME = my_csv_format);
Loading using pattern matching
Load files from a table’s stage into the table, using pattern matching to only load data from compressed CSV files in any path:
COPY INTO mytable
FILE_FORMAT = (TYPE = 'CSV')
PATTERN='.*/.*/.*[.]csv[.]gz';
Where .*
is interpreted as “zero or more occurrences of any character.” The square brackets escape the period character (.
)
that precedes a file extension.
Load files from a table stage into the table using pattern matching to only load uncompressed CSV files whose names include the string
sales
:
COPY INTO mytable
FILE_FORMAT = (FORMAT_NAME = myformat)
PATTERN='.*sales.*[.]csv';
Loading JSON data into a VARIANT column
The following example loads JSON data into a table with a single column of type VARIANT.
The staged JSON array comprises three objects separated by new lines:
[{
"location": {
"city": "Lexington",
"zip": "40503",
},
"sq__ft": "1000",
"sale_date": "4-25-16",
"price": "75836"
},
{
"location": {
"city": "Belmont",
"zip": "02478",
},
"sq__ft": "1103",
"sale_date": "6-18-16",
"price": "92567"
}
{
"location": {
"city": "Winchester",
"zip": "01890",
},
"sq__ft": "1122",
"sale_date": "1-31-16",
"price": "89921"
}]
/* Create a JSON file format that strips the outer array. */
CREATE OR REPLACE FILE FORMAT json_format
TYPE = 'JSON'
STRIP_OUTER_ARRAY = TRUE;
/* Create an internal stage that references the JSON file format. */
CREATE OR REPLACE STAGE mystage
FILE_FORMAT = json_format;
/* Stage the JSON file. */
PUT file:///tmp/sales.json @mystage AUTO_COMPRESS=TRUE;
/* Create a target table for the JSON data. */
CREATE OR REPLACE TABLE house_sales (src VARIANT);
/* Copy the JSON data into the target table. */
COPY INTO house_sales
FROM @mystage/sales.json.gz;
SELECT * FROM house_sales;
+---------------------------+
| SRC |
|---------------------------|
| { |
| "location": { |
| "city": "Lexington", |
| "zip": "40503" |
| }, |
| "price": "75836", |
| "sale_date": "4-25-16", |
| "sq__ft": "1000", |
| "type": "Residential" |
| } |
| { |
| "location": { |
| "city": "Belmont", |
| "zip": "02478" |
| }, |
| "price": "92567", |
| "sale_date": "6-18-16", |
| "sq__ft": "1103", |
| "type": "Residential" |
| } |
| { |
| "location": { |
| "city": "Winchester", |
| "zip": "01890" |
| }, |
| "price": "89921", |
| "sale_date": "1-31-16", |
| "sq__ft": "1122", |
| "type": "Condo" |
| } |
+---------------------------+
Reloading files
Add FORCE = TRUE
to a COPY command to reload (duplicate) data from a set of staged data files that have not changed (i.e. have
the same checksum as when they were first loaded).
In the following example, the first command loads the specified files and the second command forces the same files to be loaded again
(producing duplicate rows), even though the contents of the files have not changed:
COPY INTO load1 FROM @%load1/data1/
FILES=('test1.csv', 'test2.csv');
COPY INTO load1 FROM @%load1/data1/
FILES=('test1.csv', 'test2.csv')
FORCE=TRUE;
Purging files after loading
Load files from a table’s stage into the table and purge files after loading. By default, COPY does not purge loaded files from the
location. To purge the files after loading:
Make sure your account has write access to the bucket or container where the files are stored.
Set PURGE=TRUE
for the table to specify that all files successfully loaded into the table are purged after loading:
ALTER TABLE mytable SET STAGE_COPY_OPTIONS = (PURGE = TRUE);
COPY INTO mytable;
You can also override any of the copy options directly in the COPY command:
COPY INTO mytable PURGE = TRUE;
After the files are loaded into the table, the files are deleted from the bucket or container from where they are stored. After the files have begun the deletion process, the query cannot be cancelled.
Validating staged files
Validate files in a stage without loading:
Run the COPY command in validation mode and see all errors:
COPY INTO mytable VALIDATION_MODE = 'RETURN_ERRORS';
+-------------------------------------------------------------------------------------------------------------------------------+------------------------+------+-----------+-------------+----------+--------+-----------+----------------------+------------+----------------+
| ERROR | FILE | LINE | CHARACTER | BYTE_OFFSET | CATEGORY | CODE | SQL_STATE | COLUMN_NAME | ROW_NUMBER | ROW_START_LINE |
+-------------------------------------------------------------------------------------------------------------------------------+------------------------+------+-----------+-------------+----------+--------+-----------+----------------------+------------+----------------+
| Field delimiter ',' found while expecting record delimiter '\n' | @MYTABLE/data1.csv.gz | 3 | 21 | 76 | parsing | 100016 | 22000 | "MYTABLE"["QUOTA":3] | 3 | 3 |
| NULL result in a non-nullable column. Use quotes if an empty field should be interpreted as an empty string instead of a null | @MYTABLE/data3.csv.gz | 3 | 2 | 62 | parsing | 100088 | 22000 | "MYTABLE"["NAME":1] | 3 | 3 |
| End of record reached while expected to parse column '"MYTABLE"["QUOTA":3]' | @MYTABLE/data3.csv.gz | 4 | 20 | 96 | parsing | 100068 | 22000 | "MYTABLE"["QUOTA":3] | 4 | 4 |
+-------------------------------------------------------------------------------------------------------------------------------+------------------------+------+-----------+-------------+----------+--------+-----------+----------------------+------------+----------------+
Run the COPY command in validation mode for a specified number of rows. In this example, the first run encounters no errors in the
specified number of rows and completes successfully, displaying the information as it will appear when loaded into the table. The
second run encounters an error in the specified number of rows and fails with the error encountered:
COPY INTO mytable VALIDATION_MODE = 'RETURN_2_ROWS';
+--------------------+----------+-------+
| NAME | ID | QUOTA |
+--------------------+----------+-------+
| Joe Smith | 456111 | 0 |
| Tom Jones | 111111 | 3400 |
+--------------------+----------+-------+
COPY INTO mytable VALIDATION_MODE = 'RETURN_3_ROWS';
FAILURE: NULL result in a non-nullable column. Use quotes if an empty field should be interpreted as an empty string instead of a null
File '@MYTABLE/data3.csv.gz', line 3, character 2
Row 3, column "MYTABLE"["NAME":1]
Loading Iceberg-compatible Parquet data into an Iceberg table
This example covers how to create an Iceberg table and then load data into it from
Iceberg-compatible Parquet data files on an external stage.
For demonstration purposes, this example uses the following resources:
An external volume named iceberg_ingest_vol
. To create
an external volume, see Configure an external volume.
An external stage named my_parquet_stage
with Iceberg-compatible Parquet files on it. To create an external stage, see
CREATE STAGE.
Create a file format object that describes the staged Parquet files, using the required configuration for copying
Iceberg-compatible Parquet data (TYPE = PARQUET USE_VECTORIZED_SCANNER = TRUE
):
CREATE OR REPLACE FILE FORMAT my_parquet_format
TYPE = PARQUET
USE_VECTORIZED_SCANNER = TRUE;
Create a Snowflake-managed Iceberg table, defining columns with data types that are compatible with the source Parquet file data types:
CREATE OR REPLACE ICEBERG TABLE customer_iceberg_ingest (
c_custkey INTEGER,
c_name STRING,
c_address STRING,
c_nationkey INTEGER,
c_phone STRING,
c_acctbal INTEGER,
c_mktsegment STRING,
c_comment STRING
)
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'iceberg_ingest_vol'
BASE_LOCATION = 'customer_iceberg_ingest/';
Use a COPY INTO statement to load the data from the staged Parquet files (located directly under the stage URL path) into the Iceberg table:
COPY INTO customer_iceberg_ingest
FROM @my_parquet_stage
FILE_FORMAT = 'my_parquet_format'
LOAD_MODE = ADD_FILES_COPY
PURGE = TRUE
MATCH_BY_COLUMN_NAME = CASE_SENSITIVE;
Note
The example specifies LOAD_MODE = ADD_FILES_COPY
, which tells Snowflake to copy the files into your external volume location,
and then register the files to the table.
This option avoids file charges, because Snowflake doesn’t scan the source Parquet files and rewrite the data into new Parquet files.
Output:
+---------------------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|---------------------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| my_parquet_stage/snow_af9mR2HShTY_AABspxOVwhc_0_1_008.parquet | LOADED | 15000 | 15000 | 0 | 0 | NULL | NULL | NULL | NULL |
| my_parquet_stage/snow_af9mR2HShTY_AABspxOVwhc_0_1_006.parquet | LOADED | 15000 | 15000 | 0 | 0 | NULL | NULL | NULL | NULL |
| my_parquet_stage/snow_af9mR2HShTY_AABspxOVwhc_0_1_005.parquet | LOADED | 15000 | 15000 | 0 | 0 | NULL | NULL | NULL | NULL |
| my_parquet_stage/snow_af9mR2HShTY_AABspxOVwhc_0_1_002.parquet | LOADED | 5 | 5 | 0 | 0 | NULL | NULL | NULL | NULL |
| my_parquet_stage/snow_af9mR2HShTY_AABspxOVwhc_0_1_010.parquet | LOADED | 15000 | 15000 | 0 | 0 | NULL | NULL | NULL | NULL |
+---------------------------------------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
Query the table:
SELECT
c_custkey,
c_name,
c_mktsegment
FROM customer_iceberg_ingest
LIMIT 10;
Output:
+-----------+--------------------+--------------+
| C_CUSTKEY | C_NAME | C_MKTSEGMENT |
|-----------+--------------------+--------------|
| 75001 | Customer#000075001 | FURNITURE |
| 75002 | Customer#000075002 | FURNITURE |
| 75003 | Customer#000075003 | MACHINERY |
| 75004 | Customer#000075004 | AUTOMOBILE |
| 75005 | Customer#000075005 | FURNITURE |
| 1 | Customer#000000001 | BUILDING |
| 2 | Customer#000000002 | AUTOMOBILE |
| 3 | Customer#000000003 | AUTOMOBILE |
| 4 | Customer#000000004 | MACHINERY |
| 5 | Customer#000000005 | HOUSEHOLD |
+-----------+--------------------+--------------+