Snowflake Directory Tables: Query and Manage Staged Files

Spread the love

1. Introduction

The most common method to access files in a Snowflake internal or external stage is by using the LIST @<stage_name> command. In our previous article, we discussed how RESULT_SCAN can be used to query and filter the results of commands like LIST or SHOW. However, Snowflake provides an alternative and more structured way to access and query file metadata in stage locations using Directory Tables.

In this article, we will explore what Directory Tables are, and how to create and manage them effectively.

2. What are Directory Tables in Snowflake?

A Directory Table is a virtual object associated with a stage, rather than a standalone database object. It functions similarly to an external table by storing file-level metadata for the data files in the stage. They are supported in both the internal and external snowflake stages.

3. What are the Privileges required to work with Directory Tables?

A Directory Table does not have its own grantable privileges. Instead, the following privileges on the associated stage are required to operate on Directory Tables.

Internal Stage:

  • A role with READ privileges on the stage for querying (SELECT) the directory table.
  • A role with WRITE privileges on the stage for uploading (PUT) or removing (REMOVE) files, and refreshing metadata (ALTER STAGE).

External Stage:

  • A role with READ or USAGE privileges on the stage for querying (SELECT) the directory table.
  • A role with WRITE or USAGE privileges on the stage for removing (REMOVE) files and refreshing metadata (ALTER STAGE).

4. How to Create Directory Tables in Snowflake?

The process of creating a stage with a Directory Table follows the same syntax as creating a standard Snowflake internal or external stage. The key difference is that the DIRECTORY parameter must be explicitly set to TRUE to enable directory table functionality.

The following syntax demonstrates how to create an Internal stage with a directory table enabled.

CREATE OR REPLACE STAGE MY_INT_STAGE
  DIRECTORY = (ENABLE = TRUE)
  FILE_FORMAT = my_file_format;

The following syntax demonstrates how to create an external stage with a directory table enabled.

CREATE OR REPLACE STAGE MY_EXT_STAGE
URL = 's3://te-aws-s3-bucket001/'
STORAGE_INTEGRATION = AWS_STORAGE_INT
DIRECTORY = (ENABLE = TRUE);

Snowflake Directory tables can also be enabled on existing internal or external stages using the ALTER STAGE command.

ALTER STAGE my_stage SET DIRECTORY = ( ENABLE = TRUE );

5. How to Query Directory Tables in Snowflake?

The following is the syntax to query a Directory Table to retrieve a list of all files on a stage with metadata information.

SELECT * FROM DIRECTORY( @<stage_name> );

Output:

The output from a directory table query includes the following columns.

Output from querying a Directory table
Output from querying a Directory table
ColumnData TypeDescription
RELATIVE_PATHTEXTRelative path of a staged file from its location in the stage.
SIZENUMBERSize of the file (in bytes).
LAST_MODIFIEDTIMESTAMP_LTZTimestamp when the file was last updated in the stage.
MD5HEXMD5 checksum for the file computed for the file’s contents.
ETAGHEXRepresents a unique identifier for each file. It helps track changes to files in the stage by updating whenever a file is modified.
FILE_URLTEXTSnowflake File URL to the file.

Applying Filters on Directory Tables:

Filters can be applied when querying the Directory tables to extract the details of specific files from a stage location. The following are some of the examples showing the application of filters on Directory tables.

-- Retrieve files located in paths containing 'Inbox'
SELECT * FROM DIRECTORY(@MY_STAGE) 
WHERE RELATIVE_PATH LIKE '%Inbox%'

-- Retrieve files larger than 150,000 bytes 
SELECT * FROM DIRECTORY(@MY_STAGE) 
WHERE SIZE > 150000

-- Retrieve files modified in the last day 
SELECT * FROM DIRECTORY(@MY_STAGE) 
WHERE LAST_MODIFIED > CURRENT_DATE() - 1;

6. How to Manually Refresh Metadata of Directory Tables in Snowflake?

The metadata of Directory tables in Snowflake can be refreshed manually using the ALTER STAGE command. The metadata refresh updates the following to the directory table metadata.

  • New files in the path are added to the table metadata.
  • Changes to files in the path are updated in the table metadata.
  • Files no longer in the path are removed from the table metadata.

Syntax:

ALTER STAGE MY_STAGE REFRESH;

The metadata refresh of Directory tables can also be performed only on a relative file path in the stage location. This reduces the number of files that need to be verified and listed.

Syntax:

ALTER STAGE MY_STAGE REFRESH SUBPATH = 'Inbox';

7. How to Automate Metadata Refresh of Directory Tables in Snowflake?

The metadata of a directory table can be automatically refreshed by setting up the event notification services on the cloud storage provider. The configured event notifications notify Snowflake when new or updated data is available to read into the directory table metadata. 

Additionally, the directory table parameter AUTO_REFRESH must be set to TRUE while creating the stage for triggering automatic refreshes of the directory table metadata.

CREATE STAGE my_ext_stage
  URL='s3://bucket_name/inbox/'
  STORAGE_INTEGRATION = my_storage_int
  DIRECTORY = (
    ENABLE = true
    AUTO_REFRESH = true
  );

Note that Snowflake does not support automatic metadata refresh for directory tables on an Internal stage. The directory table metadata for an internal stage must be refreshed manually.

8. How are Directory Tables Billed in Snowflake?

Automatic Metadata Refresh:

  • Charges apply for event notifications used to refresh directory table metadata.
  • These costs appear as Snowpipe charges in the billing statement.
  • Charges increase based on the number of files added to cloud storage.
  • Estimated costs can be checked using the PIPE_USAGE_HISTORY function.

Manual Metadata Refresh:

  • Incurs a small maintenance charge under Snowflake’s standard cloud services billing model.
  • These charges do not appear in PIPE_USAGE_HISTORY.

9. What are Table functions available for Directory tables?

9.1. STAGE_DIRECTORY_FILE_REGISTRATION_HISTORY

This table function can be used to query information about the metadata history for a directory table, including:

  • Files added or removed as part of a metadata refresh.
  • Any errors found when refreshing the metadata.

The following query retrieves the metadata stored for all data files referenced by the stage.

SELECT * 
FROM TABLE(information_schema.stage_directory_file_registration_history(
STAGE_NAME=>'MY_EXT_STAGE'));

Output:

9.2. AUTO_REFRESH_REGISTRATION_HISTORY

This table function can be used to query the history of data files registered in the metadata and return billing activity within the last 14 days.

SELECT * 
FROM TABLE(information_schema.auto_refresh_registration_history(
    date_range_start=>dateadd('day',-14,current_date()),
    date_range_end=>current_date(),
    object_type=>'DIRECTORY_TABLE'));

10. Common Use Case of Directory Tables in Snowflake

Directory tables in Snowflake provide a structured way to manage and query staged files. Below are some key use cases:

  • File Discovery: Query the directory table to retrieve a list of files stored in a stage, along with metadata such as size, last modified date, and relative path.
  • Change Tracking: Create streams on the stage to monitor file additions, modifications, or deletions.
    (Note: Streams are created on the stage itself, not on the directory table.)
  • File Processing Pipelines: Leverage directory tables with Snowpark API or external functions to build automated file processing workflows.

11. Summary

Directory Tables in Snowflake provide a structured way to access file metadata in both internal and external stages, offering an alternative to the LIST @<stage_name> command. They store file-level metadata such as file size, last modified timestamp, and checksum details.

  • Creation & Privileges: Directory tables are enabled using the DIRECTORY = (ENABLE = TRUE) parameter while creating a stage. They inherit privileges from the associated stage.
  • Querying & Filtering: Users can query directory tables using SELECT * FROM DIRECTORY(@stage_name) and apply filters based on file attributes.
  • Metadata Management: Metadata can be refreshed manually using ALTER STAGE ... REFRESH or automated via event notifications (supported only for external stages).
  • Billing: Charges apply for automatic metadata refresh (billed under Snowpipe) and manual refresh incurs a small maintenance cost.
  • Use Cases: Directory tables are useful for file discovery, change tracking using streams, and file processing pipelines with Snowpark or external functions.

By leveraging Directory Tables, Snowflake users can efficiently manage and query staged file metadata for various data processing workflows.

Subscribe to our Newsletter !!

Related Articles:

Leave a Comment

Related Posts