HOW TO: Create External Stages in Snowflake

Spread the love


In addition to loading directly from files in external locations like AWS S3, Azure containers and Google Cloud Storage, Snowflake supports creating named external stages. These stages specifies where data files are stored (i.e. “staged”) so that the data in the files can be loaded into a table.

External Stages are recommended when you plan to load data regularly from the same location. The external stages also allows you to create external tables on top of the files present in their locations there by allowing you to query the data present in files using SQL. The external tables created on top of the data files can also be used to join with other Snowflake tables, create views and materialized views.

In this article let us discuss how to create External Stages in Snowflake.

Creating External Stages in Snowflake

External Stages can be created in following ways.

  1. Web Interface / SQL using Secret Access Keys of cloud provider.
  2. Configuring a Cloud Storage Integration.

The first method is the easiest way of creating an external stage in Snowflake but it involves supplying cloud storage credentials to connect to external location.

The second method is a recommended option which avoids the need to supplying cloud storage credentials when creating stages or loading data. It involves several steps in creating external Stages.

Creating External Stages using Web Interface

Follow below steps to create an external stage using Web Interface

  1. Navigate to Databases
  2. Select the Database in which you want to create an external stage.
  3. Go to Stages tab and click Create.
  4. Select the external cloud storage provider and click Next.
  5. Provide the details of stage name and the schema in which stage needs to be created.
  6. Provide the URL of the location and the secret access keys to connect.
  7. Click Finish to create an external stage.

The below example shows creating an external stage in Snowflake on Azure using Web Interface.

creating an external stage in Snowflake on Azure using Web Interface

The external stages can also be created using SQL statements by passing cloud storage credentials as shown below.

use schema mydb.public;

-- Creating an external stage in Snowflake on Azure location

CREATE STAGE my_azure_stage 
URL = 'azure://myazurespace.blob.core.windows.net/snowflake' 
CREDENTIALS = (AZURE_SAS_TOKEN = '***************************');

-- Creating an external stage in Snowflake on AWS S3 location

CREATE STAGE my_s3_stage 
URL = 's3://mybucket/snowflake/' 
CREDENTIALS = (AWS_KEY_ID = '*********************'  AWS_SECRET_KEY = '**************');

Creating External Stages by configuring a Cloud Storage Integration

A storage Integration is a Snowflake object that stores a generated identity and access management (IAM) entity for your external cloud storage, along with an optional set of allowed or blocked storage locations in cloud storage providers (Amazon S3, Google Cloud Storage, or Microsoft Azure).

Cloud provider administrators in your organization grant permissions on the storage locations to the generated entity. This option allows users to avoid supplying credentials when creating stages or when loading or unloading data.

A single Storage Integration can support multiple external stages.

Follow below steps to create an external stage using Cloud Storage Integration.

Step-1: Creating Cloud Storage Integration

The following examples shows how to create Cloud Storage Integration.

Amazon S3

create storage integration s3_int
type = external_stage
storage_provider = s3
enabled = true
storage_aws_role_arn = '<iam_role>'
storage_allowed_locations = ('s3://bucket/path/', 's3://bucket/path2/');

Snowflake recommends creating an IAM policy for Snowflake to access the S3 bucket. You can then create an IAM Role and attach the policy to the role and use the security credentials generated by AWS for the role to access files in the bucket.

Follow below links to create an IAM Policy and IAM Role in AWS for Snowflake

<iam_role> specifies the Amazon Resource Name (ARN) of the AWS IAM role you create using above steps.

Microsoft Azure

create storage integration azure_int
type = external_stage
storage_provider = azure
enabled = true
azure_tenant_id = '<tenant_id>'
storage_allowed_locations = ('azure://account.blob.core.windows.net/container/path/','azure://account.blob.core.windows.net/container/path2/');

<tenant_id> is the ID for your Office 365 tenant that the allowed and blocked storage accounts belong to. A storage integration can authenticate to only one tenant, and so the allowed and blocked storage locations must refer to storage accounts that all belong this tenant.

To find your tenant ID, log into the Azure portal and click Azure Active Directory » Properties. The tenant ID is displayed in the Directory ID field.

Google Cloud Storage

create storage integration gcs_int
type = external_stage
storage_provider = gcs
enabled = true
storage_allowed_locations = ('gcs://bucket/path/','gcs://bucket/path2/');

You can also specify locations that you want to block using storage_blocked_locations in the examples shown above.

Step-2: Retrieve the Snowflake Service Account/Client Name/IAM User for your Snowflake Account  

Execute the DESCRIBE INTEGRATION command to retrieve the details of the service account that was created automatically for your Snowflake account.

DESC STORAGE INTEGRATION <integration_name>;

<integration_name> is the name of the Cloud Storage Integration you created in Step 1.

Amazon S3

desc integration s3_int;

Record the values in the following columns from the output result.

ValueDescription
STORAGE_AWS_IAM_USER_ARNThe AWS IAM user created for your Snowflake account. We provision a single IAM user for your entire Snowflake account. All S3 storage integrations use that IAM user.
STORAGE_AWS_EXTERNAL_IDThe external ID that is needed to establish a trust relationship.

Microsoft Azure

desc integration azure_int;

Record the values in the following columns from the output result.

ValueDescription
AZURE_CONSENT_URLURL to the Microsoft permissions request page.
AZURE_MULTI_TENANT_APP_NAMEName of the Snowflake client application created for your account. In a later step in this section, you will need to grant this application the permissions necessary to obtain an access token on your allowed storage locations.

Google Cloud Storage

desc integration gcs_int;

Record the values in the following columns from the output result.

ValueDescription
STORAGE_GCP_SERVICE_ACCOUNTCloud Storage service account created for your Snowflake account. We provision a single Cloud Storage service account for your entire Snowflake account. All Cloud Storage integrations use that service account.

Step-3: Grant Snowflake permissions to Access to the Storage Locations

Depending on the Cloud Storage on which you wanted to create an external stage, the process to assign the permissions to access the storage locations is different.

Follow below links to grant Snowflake permissions to Access to the Storage Locations

Step-4: Create an External Stage

Create an external stage that references the storage integration you created in Step 1 using the CREATE STAGE command.

Amazon S3

Creating an External Stage on AWS S3

create stage my_s3_stage
  storage_integration = s3_int
  url = 's3://bucket1/path1'
  file_format = my_csv_format;

Microsoft Azure

Creating an External Stage on Azure

create stage my_azure_stage
  storage_integration = azure_int
  url = 'azure://myaccount.blob.core.windows.net/container1/path1'
  file_format = my_csv_format;

Google Cloud Storage

Creating an External Stage on Google Cloud

create stage my_gcs_stage
  url = 'gcs://mybucket1/path1'
  storage_integration = gcs_int
  file_format = my_csv_format;

Related Articles:

  • Overview of Snowflake Time Travel
  • Snowflake SnowSQL: Command Line Tool to access Snowflake
  • Types of Snowflake Stages: Data Loading and Unloading Features

Leave a Comment

Related Posts