Snowflake in 20 minutes¶
Introduction¶
This tutorial uses the Snowflake command line client, SnowSQL, to introduce key concepts and tasks, including:
Creating Snowflake objects—You create a database and a table for storing data.
Loading data—We provide small sample CSV data files for you to load into the table.
Querying—You explore sample queries.
Note
Snowflake bills a minimal amount for the on-disk storage used for the sample data in this tutorial. The tutorial provides steps to drop the database and minimize storage cost.
Snowflake requires a virtual warehouse to load the data and execute queries. A running virtual warehouse consumes Snowflake credits. In this tutorial, you will be using a 30-day trial account, which provides free credits, so you won’t incur any costs.
What you’ll learn¶
In this tutorial you’ll learn how to:
Create Snowflake objects—You create a database and a table for storing data.
Install SnowSQL—You install and use SnowSQL, the Snowflake command line query tool.
Users of Visual Studio Code might consider using the Snowflake Extension for Visual Studio Code instead of SnowSQL.
Load CSV data files—You use various mechanisms to load data into tables from CSV files.
Write and execute sample queries—You write and execute a variety of queries against newly loaded data.
Prerequisites¶
This tutorial requires a database, table, and virtual warehouse to load and query data. Creating these Snowflake objects requires a Snowflake user with a role with the necessary access control privileges. In addition, SnowSQL is required to execute the SQL statements in the tutorial. Lastly, the tutorial requires CSV files that contain sample data to load.
You can complete this tutorial using an existing Snowflake warehouse, database, and table, and your own local data files, but we recommend using the Snowflake objects and the set of provided data.
To set up Snowflake for this tutorial, complete the following before continuing:
Create a user
To create the database, table, and virtual warehouse, you must be logged in as a Snowflake user with a role that grants you the privileges to create these objects.
If you’re using a 30-day trial account, you can log in as the user that was created for the account. This user has the role with the privileges needed to create the objects.
If you don’t have a Snowflake user, you can’t perform this tutorial. If you don’t have a role that lets you create a user, ask someone who does to perform this step for you. Users with the ACCOUNTADMIN or SECURITYADMIN role can create users.
Install SnowSQL
To install SnowSQL, see Installing SnowSQL.
Download sample data files
For this tutorial you download sample employee data files in CSV format that Snowflake provides.
To download and unzip the sample data files:
Download the set of sample data files. Right-click the name of the archive file,
getting-started.zip
, and save the link/file to your local file system.Unzip the sample files. The tutorial assumes you unpacked files into one of the following directories:
Linux/macOS:
/tmp
Windows:
C:\\temp
Each file has five data records. The data uses a comma (,) character as field delimiter. The following is an example record:
Althea,Featherstone,afeatherstona@sf_tuts.com,"8172 Browning Street, Apt B",Calatrava,7/12/2017
There are no blank spaces before or after the commas separating the fields in each record. This is the default that Snowflake expects when loading CSV data.
Log in to SnowSQL¶
After you have SnowSQL, start SnowSQL to connect to Snowflake:
Open a command line window.
Start SnowSQL:
$ snowsql -a
-u Where:
is the unique identifier for your Snowflake account.The preferred format of the account identifier is as follows:
organization_name-account_name
Names of your Snowflake organization and account. For more information, see Format 1 (preferred): Account name in your organization.
is the login name for your Snowflake user.
Note
If your account has an identity provider (IdP) that has been defined for your account, you can use a web browser to authenticate instead of a password, as the following example demonstrates:
$ snowsql -a
-u --authenticator externalbrowser For more information, see Using a web browser for federated authentication/SSO.
When SnowSQL prompts you, enter the password for your Snowflake user.
If you log in successfully, SnowSQL displays a command prompt that includes your current warehouse, database, and schema.
Note
If you get locked out of the account and can’t obtain the account identifier, you can find it in the Welcome email that Snowflake sent to
you when you signed up for the trial account, or you can work with your
ORGADMIN to get the account details.
You can also find the values for locator
, cloud
, and region
in the Welcome email.
If your Snowflake user doesn’t have a default warehouse, database, and schema, or if
you didn’t configure SnowSQL to specify a default warehouse, database, and schema,
the prompt displays no warehouse
, no database
, and no schema
. For example:
user-name#(no warehouse)@(no database).(no schema)>
This prompt indicates that there is no warehouse, database, and schema selected for the current session. You create these objects in the next step. As you follow the next steps in this tutorial to create these objects, the prompt automatically updates to include the names of these objects.
For more information, see Connecting through SnowSQL.
Create Snowflake objects¶
During this step you create the following Snowflake objects:
A database (
sf_tuts
) and a table (emp_basic
). You load sample data into this table.A virtual warehouse (
sf_tuts_wh
). This warehouse provides the compute resources needed to load data into the table and query the table. For this tutorial, you create an X-Small warehouse.
At the completion of this tutorial, you will remove these objects.
Create a database¶
Create the sf_tuts
database using the CREATE DATABASE command:
CREATE OR REPLACE DATABASE sf_tuts;
In this tutorial, you use the default schema (public
) available for each database, rather than creating a new schema.
Note that the database and schema you just created are now in use for your current session, as reflected in the SnowSQL command prompt. You can also use the context functions to get this information.
SELECT CURRENT_DATABASE(), CURRENT_SCHEMA();
The following is an example result:
+--------------------+------------------+
| CURRENT_DATABASE() | CURRENT_SCHEMA() |
|--------------------+------------------|
| SF_TUTS | PUBLIC |
+--------------------+------------------+
Create a table¶
Create a table named emp_basic
in sf_tuts.public
using the CREATE TABLE command:
CREATE OR REPLACE TABLE emp_basic (
first_name STRING ,
last_name STRING ,
email STRING ,
streetaddress STRING ,
city STRING ,
start_date DATE
);
Note that the number of columns in the table, their positions, and their data types correspond to the fields in the sample CSV data files that you stage in the next step in this tutorial.
Create a virtual warehouse¶
Create an X-Small warehouse named sf_tuts_wh
using the CREATE WAREHOUSE command:
CREATE OR REPLACE WAREHOUSE sf_tuts_wh WITH
WAREHOUSE_SIZE='X-SMALL'
AUTO_SUSPEND = 180
AUTO_RESUME = TRUE
INITIALLY_SUSPENDED=TRUE;
The sf_tuts_wh
warehouse is initially suspended, but the DML statement also sets
AUTO_RESUME = true
. The AUTO_RESUME setting causes a warehouse to automatically start
when SQL statements that require compute resources are executed.
After you create the warehouse, it’s now in use for your current session. This information is displayed in your SnowSQL command prompt. You can also retrieve the name of the warehouse by using the following context function:
SELECT CURRENT_WAREHOUSE();
The following is an example result:
+---------------------+
| CURRENT_WAREHOUSE() |
|---------------------|
| SF_TUTS_WH |
+---------------------+
Stage data files¶
A Snowflake stage is a location in cloud storage that you use to load and unload data from a table. Snowflake supports the following types of stages:
Internal stages—Used to store data files internally within Snowflake. Each user and table in Snowflake gets an internal stage by default for staging data files.
External stages—Used to store data files externally in Amazon S3, Google Cloud Storage, or Microsoft Azure. If your data is already stored in these cloud storage services, you can use an external stage to load data in Snowflake tables.
In this tutorial, we upload the sample data files
(downloaded in Prerequisites)
to the internal stage for the emp_basic
table that you created earlier. You use the PUT command
to upload the sample data files to that stage.
Staging sample data files¶
Execute the PUT command in SnowSQL to upload local data files to the table stage
provided for the emp_basic
table you created.
PUT file://<file-path>[/\]employees0*.csv @sf_tuts.public.%emp_basic;
For example:
Linux or macOS
PUT file:///tmp/employees0*.csv @sf_tuts.public.%emp_basic;
Windows
PUT file://C:\temp\employees0*.csv @sf_tuts.public.%emp_basic;
Let’s take a closer look at the command:
file://
specifies the full directory path and names of the files on your local machine to stage. Note that file system wildcards are allowed, and if multiple files fit the pattern they are all displayed.[/]employees0*.csv @
indicates to use the stage for the specified table, in this case the.% emp_basic
table.
The command returns the following result, showing the staged files:
+-----------------+--------------------+-------------+-------------+--------------------+--------------------+----------+---------+
| source | target | source_size | target_size | source_compression | target_compression | status | message |
|-----------------+--------------------+-------------+-------------+--------------------+--------------------+----------+---------|
| employees01.csv | employees01.csv.gz | 360 | 287 | NONE | GZIP | UPLOADED | |
| employees02.csv | employees02.csv.gz | 355 | 274 | NONE | GZIP | UPLOADED | |
| employees03.csv | employees03.csv.gz | 397 | 295 | NONE | GZIP | UPLOADED | |
| employees04.csv | employees04.csv.gz | 366 | 288 | NONE | GZIP | UPLOADED | |
| employees05.csv | employees05.csv.gz | 394 | 299 | NONE | GZIP | UPLOADED | |
+-----------------+--------------------+-------------+-------------+--------------------+--------------------+----------+---------+
The PUT command compresses files by default using gzip
, as indicated in the TARGET_COMPRESSION column.
Listing the staged files (Optional)¶
You can list the staged files using the LIST command.
LIST @sf_tuts.public.%emp_basic;
The following is an example result:
+--------------------+------+----------------------------------+------------------------------+
| name | size | md5 | last_modified |
|--------------------+------+----------------------------------+------------------------------|
| employees01.csv.gz | 288 | a851f2cc56138b0cd16cb603a97e74b1 | Tue, 9 Jan 2018 15:31:44 GMT |
| employees02.csv.gz | 288 | 125f5645ea500b0fde0cdd5f54029db9 | Tue, 9 Jan 2018 15:31:44 GMT |
| employees03.csv.gz | 304 | eafee33d3e62f079a054260503ddb921 | Tue, 9 Jan 2018 15:31:45 GMT |
| employees04.csv.gz | 304 | 9984ab077684fbcec93ae37479fa2f4d | Tue, 9 Jan 2018 15:31:44 GMT |
| employees05.csv.gz | 304 | 8ad4dc63a095332e158786cb6e8532d0 | Tue, 9 Jan 2018 15:31:44 GMT |
+--------------------+------+----------------------------------+------------------------------+
Copy data into target tables¶
To load your staged data into the target table, execute COPY INTO