Change Data Capture using Snowflake Dynamic Tables

Spread the love

1. Introduction

In our previous articles, we have discussed Streams that provide Change Data Capture (CDC) capabilities to track the changes made to tables and Tasks that allow scheduled execution of SQL statements. Using both Streams and Tasks as a combination, we were able to track changes in a table and push the incremental changes to a target table periodically.

Snowflake introduced a new table type called Dynamic Tables which simplifies the whole process of identifying the changes in a table and periodically refresh.

In this article, let us discuss how dynamic tables work and how they are different from Stream and Tasks, their advantages and limitations.

2. Snowflake Dynamic Tables

A Dynamic table materializes the result of a query that you specify. It can track the changes in the query data you specify and refresh the materialized results incrementally through an automated process.

To incrementally load data from a base table into a target table, define the target table as a dynamic table and specify the SQL statement that performs the transformation on the base table. The dynamic table eliminates the additional step of identifying and merging changes from the base table, as the entire process is automatically performed within the dynamic table.

Dynamic tables support Time Travel, Masking, Tagging, Replication etc. just like a standard Snowflake table.

3. How to Create Snowflake Dynamic Tables?

Below is the syntax of creating Dynamic Tables in Snowflake.

CREATE OR REPLACE DYNAMIC TABLE <name>
  TARGET_LAG = { '<num> { seconds | minutes | hours | days }' | DOWNSTREAM }
  WAREHOUSE = <warehouse_name>
AS
  <query>

1. <name>: Name of the dynamic table.

2. TARGET_LAG: Specifies the lag between the dynamic table and the base table on which the dynamic table is built.

The value of TARGET_LAG can be specified in two different ways.

  • ‘<num> { seconds | minutes | hours | days }’ : Specifies the maximum amount of time that the dynamic table’s content should lag behind updates to the base tables.
    Example: 1 minute, 7 hours, 2 days etc.
  • DOWNSTREAM: Specifies that a different dynamic table is built based on the current dynamic table, and the current dynamic table refreshes on demand, when the downstream dynamic table need to refresh.

For example consider Dynamic Table 2 (DT2) is defined based on Dynamic Table 1 (DT1) which is based on table (T1).

Dynamic Tables Dependency

Let’s say if the target lag is set to 5 minutes for DT2, defining target lag as DOWNSTREAM for DT1 infers that DT1 gets its lag from DT2 and is updated every time DT2 refreshes.

3. <warehouse_name>: Specifies the name of the warehouse that provides the compute resources for refreshing the dynamic table.

4. <query>: Specifies the query on which the dynamic table is built.

4. How do Snowflake Dynamic Tables work?

Let us understand how Dynamic Tables work with a simple example.

Consider we have a base table named EMPLOYEES_RAW as shown below. The requirement is to identify the changes in the base table data and incrementally refresh the data in EMPLOYEES target table.

-- create employees_raw table
CREATE OR REPLACE TABLE EMPLOYEES_RAW(
ID NUMBER,
NAME VARCHAR(50),
SALARY NUMBER
);

--Insert three records into table
INSERT INTO EMPLOYEES_RAW VALUES (101,'Tony',25000);
INSERT INTO EMPLOYEES_RAW VALUES (102,'Chris',55000);
INSERT INTO EMPLOYEES_RAW VALUES (103,'Bruce',40000);

Create a Dynamic table named EMPLOYEES which refreshes every 1 minute and reads data from EMPLOYEES_RAW table as shown below.

CREATE OR REPLACE DYNAMIC TABLE EMPLOYEES
  TARGET_LAG = '1 minute'
  WAREHOUSE = COMPUTE_WH
AS
  SELECT ID, NAME, SALARY FROM EMPLOYEES_RAW;

The below image shows the data present in the dynamic table EMPLOYEES after creation.

Dynamic table data after creation
Dynamic table data after creation

The following changes are performed on the base table.

INSERT INTO EMPLOYEES_RAW VALUES (104,'Clark',35000);
UPDATE EMPLOYEES_RAW SET SALARY = '45000' WHERE ID = '103';
DELETE FROM EMPLOYEES_RAW WHERE ID = '102';

When the dynamic table is queried after the target lag time defined (1minute in this case), the table is refreshed with all the latest changes performed on the raw table as shown below.

Dynamic table data after refresh
Dynamic table data after refresh

The automated refresh process identifies the changes in the results of the query defined and does an incremental refresh of data in the Dynamic table. Note that this is NOT a full data refresh.

5. Differences Between Snowflake Dynamic Tables and Snowflake Streams and Tasks

The following example demonstrates how dynamic tables simplify the process of change data capture compared to implementation through Streams and Tasks.

Consider there is a raw table EMPLOYEES_RAW, from which the data needs to be refreshed incrementally at a periodic frequency.

SQL statements for Streams and Tasks

--Create a stream to capture the changes in the raw table
CREATE OR REPLACE STREAM MY_STREAM ON TABLE EMPLOYEES_RAW;

--Create a table that stores data from raw table
CREATE OR REPLACE TABLE EMPLOYEES(
    ID NUMBER,
    NAME VARCHAR(50),
    SALARY NUMBER
);

--Create a task that executes every 1 minutes and merges the changes from raw table into the target table
CREATE OR REPLACE TASK my_streamtask
  WAREHOUSE = COMPUTE_WH
  SCHEDULE = '1 minute'
WHEN
  SYSTEM$STREAM_HAS_DATA('my_stream')
AS
MERGE INTO EMPLOYEES a USING MY_STREAM b ON a.ID = b.ID
  WHEN MATCHED AND metadata$action = 'DELETE' AND metadata$isupdate = 'FALSE' 
    THEN DELETE
  WHEN MATCHED AND metadata$action = 'INSERT' AND metadata$isupdate = 'TRUE' 
    THEN UPDATE SET a.NAME = b. NAME, a.SALARY = b.SALARY
  WHEN NOT MATCHED AND metadata$action = 'INSERT' AND metadata$isupdate = 'FALSE'
    THEN INSERT (ID, NAME, SALARY) VALUES (b.ID, b.NAME, b.SALARY);

SQL statements for Dynamic Tables

--Create a Dynamic table that refreshes data from raw table every 1 minute 
CREATE OR REPLACE DYNAMIC TABLE EMPLOYEES
 TARGET_LAG = '1 minute'
 WAREHOUSE = COMPUTE_WH
AS
 SELECT ID, NAME, SALARY FROM EMPLOYEES_RAW;

The below image illustrates how the data is refreshed using Streams and Tasks vs Dynamic tables.

Change Data Capture using Streams and Tasks Vs Dynamic Tables

Change data capture using Streams and Tasks supports procedural code to transform data from base tables using Stored Procedures, UDFs and External Functions. On the other hand, the Dynamic tables cannot contain calls to stored procedures and tasks. They only support SQL with joins, aggregations, window functions, and other SQL functions and constructions.

6. Differences Between Snowflake Dynamic Tables and Materialized Views

Though Dynamic Tables and Materialized Views are similar in a way as they both materialize the results of a SQL query, there are some important differences.

Materialized ViewsDynamic Tables
A Materialized View cannot use a complex SQL query with joins or nested views.A Dynamic table can be based on a complex query, including one with joins and unions.
A Materialized View can be built using only a single base table.A Dynamic table can be built using multiple base tables including other dynamic tables.
A Materialized View always returns the current data when executed.A Materialized View returns the data latest up to the target lag time.

7. Get Information of Existing Dynamic Tables in Snowflake

7.1. SHOW DYNAMIC TABLES

The command lists all the dynamic tables, including the information of dynamic tables such as database, schema, rows, target lag, refresh mode, warehouse, DDL etc. for which the user has access privileges.

Below are the examples of usage of the SHOW DYNAMIC TABLES command.

SHOW DYNAMIC TABLES;
SHOW DYNAMIC TABLES LIKE 'EMP_%';
SHOW DYNAMIC TABLES LIKE 'EMP_%' IN SCHEMA mydb.myschema;
SHOW DYNAMIC TABLES STARTS WITH 'EMP';

7.2. DESCRIBE DYNAMIC TABLE

The command describes the columns in a dynamic table.

Below are the examples of usage of the DESCRIBE DYNAMIC TABLE command.

DESCRIBE DYNAMIC TABLE <table_name>;
DESC DYNAMIC TABLE <table_name>;

8. Managing Dynamic Tables Refresh

Dynamic Tables refreshed can be managed using the following Operations.

8.1. Suspend

Suspend operation stops all the refreshes on a dynamic table.

If a dynamic table DT2 is based on dynamic table DT1, suspending DT1 would also suspend DT2.

ALTER DYNAMIC TABLE <table_name> SUSPEND;

8.2. Resume

Resume operation restarts refreshes on a suspended dynamic table.

If a dynamic table DT2 is based on dynamic table DT1, and if DT1 is manually suspended, resuming DT1 would not resume DT2.

ALTER DYNAMIC TABLE <table_name> RESUME;

8.3. Refresh Manually

Refresh operation triggers a manual refresh of a dynamic table.

If a dynamic table DT2 is based on dynamic table DT1, manually refreshing DT2 would also refresh DT1.

ALTER DYNAMIC TABLE <table_name> REFRESH;

9. Monitoring Dynamic Tables Refresh Errors

A Dynamic table is suspended if the system observes five continuous refresh errors and are referred as Auto Suspended.

To monitor refresh errors, following INFORMATION_SCHEMA table functions can be used.

9.1. DYNAMIC_TABLE_REFRESH_HISTORY

The DYNAMIC_TABLE_REFRESH_HISTORY table function provides the history of refreshes of dynamic tables in the account.

The following query provides details of refresh errors of the dynamic tables using DYNAMIC_TABLE_REFRESH_HISTORY table function.

SELECT * FROM
  TABLE (
    INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY (
  NAME_PREFIX => 'DEMO_DB.PUBLIC.', ERROR_ONLY => TRUE)
  )
ORDER BY name, data_timestamp;

9.2. Snowsight

To monitor refresh errors from Snowsight, navigate to Data > Databases > db > schema > Dynamic Tables > table > Refresh History

The below image shows the refresh history of the EMPLOYEES dynamic table in the dynamic table details page in Snowsight.

Dynamic table Refresh History Page
Dynamic table Refresh History Page

10. Monitor Dynamic Tables Graph

A Dynamic table could be built on multiple base tables including other dynamic tables.

For example, we have a dynamic table DT2 which is built based on dynamic table DT1 which is in turn built on a base table. To determine all the dependencies of a dynamic table, following options are available.

10.1. DYNAMIC_TABLE_GRAPH_HISTORY

The DYNAMIC_TABLE_GRAPH_HISTORY table function provides the history of each dynamic table, its properties, and its dependencies on other tables and dynamic tables.

SELECT * FROM
  TABLE (
    INFORMATION_SCHEMA.DYNAMIC_TABLE_GRAPH_HISTORY()
  );

10.2. Snowsight

Snowsight provides a directed acyclic graph (DAG) view of dynamic tables which provides details of all the upstream and downstream dependencies of a dynamic table.

The below image shows the DAG view of DT1 dynamic table in Snowsight providing details of both upstream and downstream table dependencies.

Dynamic table DAG view
Dynamic table DAG view

Subscribe to our Newsletter !!

Related Articles:

  • Overview of Snowflake Time Travel

    Snowflake Time Travel enables accessing historical data that has been changed or deleted at any point within a defined period.

    READ MORE

  • Snowflake SnowSQL: Command Line Tool to access Snowflake

    A definitive guide on how to download, install, configure and use Snowflake SnowSQL Command Line tool

    READ MORE

  • Snowflake Zero Copy Cloning

    Snowflake’s Zero Copy Cloning feature is a quick and easy way to create copies of database objects without incurring any additional costs.

    READ MORE

Leave a Comment

Related Posts