1. ASOF Join in Snowflake
ASOF JOIN in Snowflake matches each row in one table to the closest row in another table based on a specified condition, typically involving time series data. It is often used to match records where one table has event data, and the other has time-series data like prices, exchange rates, or metrics making it useful for scenarios where you need to use the closest available data up to a certain point.
Syntax:
The following is a sample query demonstrating the syntax of the ASOF Join in Snowflake.
SELECT
t1.c1, t2.c2
FROM
table1 t1
ASOF JOIN
table2 t2
MATCH_CONDITION
(t1.date_column {>=, <=, >, <} t2.date_column)
-- optional
ON t1.c1 = t2.c1 AND t1.c2 = t2.c2 | USING (c1,c2)
;
Key Characteristics of ASOF Join:
- The equals operator (=) is not supported in ASOF Join.
- The supported comparison operators in ASOF Join are >=, <=, >, < .
- All the date and timestamp related data types are supported for the ASOF join.
- The optional ON or USING clause defines one or more equality conditions on columns in the two tables.
- The comparison operator in the ON clause must be the equal sign (=).
2. Demonstration of ASOF Join in Snowflake
The following tables, one with event data of sales on a particular day and the other with time series data of product prices over a period of time are used to demonstrate the ASOF join.
-- PRODUCT_PRICES table
--------------------------
|"START_DATE" |"PRICE" |
--------------------------
|2024-09-01 |100.00 |
|2024-09-03 |105.00 |
|2024-09-07 |110.00 |
|2024-09-09 |115.00 |
--------------------------
-- SALES table
----------------------------------------
|"SALE_ID" |"SALE_DATE" |"QUANTITY" |
----------------------------------------
|1 |2024-09-01 |10 |
|2 |2024-09-02 |20 |
|3 |2024-09-03 |30 |
|4 |2024-09-04 |40 |
|5 |2024-09-05 |10 |
|6 |2024-09-06 |50 |
|7 |2024-09-07 |20 |
|8 |2024-09-08 |60 |
|9 |2024-09-09 |30 |
|10 |2024-09-10 |70 |
----------------------------------------
The requirement is to calculate the total sales amount on each day. The following query matches each sale with the most recent price available before or on the sale date.
SELECT
S.sale_id,
S.sale_date,
S.quantity,
P.price AS price_at_sale_date,
S.quantity * P.price AS total_amount
FROM
SALES S
ASOF JOIN
PRODUCT_PRICES P
MATCH_CONDITION
(S.sale_date >= P.start_date);
The output of the query is as follows:
--------------------------------------------------------------------------------
|"SALE_ID" |"SALE_DATE" |"QUANTITY" |"PRICE_AT_SALE_DATE" |"TOTAL_AMOUNT" |
--------------------------------------------------------------------------------
|1 |2024-09-01 |10 |100.00 |1000.00 |
|2 |2024-09-02 |20 |100.00 |2000.00 |
|3 |2024-09-03 |30 |105.00 |3150.00 |
|4 |2024-09-04 |40 |105.00 |4200.00 |
|5 |2024-09-05 |10 |105.00 |1050.00 |
|6 |2024-09-06 |50 |105.00 |5250.00 |
|7 |2024-09-07 |20 |110.00 |2200.00 |
|8 |2024-09-08 |60 |110.00 |6600.00 |
|9 |2024-09-09 |30 |115.00 |3450.00 |
|10 |2024-09-10 |70 |115.00 |8050.00 |
--------------------------------------------------------------------------------
Explanation:
- The total_amount is calculated based on the price effective on or before the sale date and the quantity sold on that date.
- Although there is no price available on 2023–04–02, the query used the price from the day before.
- Similarly, the sales on dates with no direct price update (i.e. 2024-09-04, 2024-09-05, 2024-09-06, 2024-09-08 and 2024-09-10) use the last available price before that date.
3. Usage of ON clause in ASOF Join Queries
The ON clause in ASOF JOIN queries is used to logically group and match results based on specified conditions. In the case of time series data, it helps ensure that each event is paired with the most relevant time-based entry.
Let’s understand the usage of the ON clause in ASOF JOIN queries by adding the PRODUCT_ID column to both the SALES and PRODUCT_PRICES tables.
-- PRODUCT_PRICES table
-----------------------------------------
|"PRODUCT_ID" |"START_DATE" |"PRICE" |
-----------------------------------------
|1 |2024-09-01 |100.00 |
|1 |2024-09-03 |105.00 |
|1 |2024-09-07 |110.00 |
|1 |2024-09-09 |115.00 |
|2 |2024-09-01 |200.00 |
|2 |2024-09-07 |210.00 |
|3 |2024-09-01 |300.00 |
|3 |2024-09-10 |320.00 |
-----------------------------------------
-- SALES table
-------------------------------------------------------
|"SALE_ID" |"PRODUCT_ID" |"SALE_DATE" |"QUANTITY" |
-------------------------------------------------------
|1 |1 |2024-09-01 |10 |
|2 |1 |2024-09-02 |20 |
|3 |1 |2024-09-03 |30 |
|4 |1 |2024-09-04 |40 |
|5 |1 |2024-09-05 |10 |
|6 |2 |2024-09-06 |50 |
|7 |2 |2024-09-07 |20 |
|8 |2 |2024-09-08 |60 |
|9 |3 |2024-09-09 |30 |
|10 |3 |2024-09-10 |70 |
|11 |4 |2024-09-01 |30 |
-------------------------------------------------------
In this case, we have multiple products with prices that vary over time. When calculating the total sales amount, we need to match each sale with the corresponding product’s price at the time of the sale.
The following query matches each sale with the most recent price of the corresponding product available on or before the sale date.
SELECT
S.sale_id,
S.product_id,
S.sale_date,
S.quantity,
P.price AS price_at_sale_date,
S.quantity * P.price AS total_amount
FROM
SALES S
ASOF JOIN
PRODUCT_PRICES P
MATCH_CONDITION
(S.sale_date >= P.start_date)
ON
S.product_id = P.product_id;
The output of the query is as follows:
------------------------------------------------------------------------------------
|"SALE_ID"|"PRODUCT_ID"|"SALE_DATE" |"QUANTITY"|"PRICE_AT_SALE_DATE"|"TOTAL_AMOUNT"|
------------------------------------------------------------------------------------
|1 |1 |2024-09-01 |10 |100.00 |1000.00 |
|2 |1 |2024-09-02 |20 |100.00 |2000.00 |
|3 |1 |2024-09-03 |30 |105.00 |3150.00 |
|4 |1 |2024-09-04 |40 |105.00 |4200.00 |
|5 |1 |2024-09-05 |10 |105.00 |1050.00 |
|6 |2 |2024-09-06 |50 |200.00 |10000.00 |
|7 |2 |2024-09-07 |20 |210.00 |4200.00 |
|8 |2 |2024-09-08 |60 |210.00 |12600.00 |
|9 |3 |2024-09-09 |30 |300.00 |9000.00 |
|10 |3 |2024-09-10 |70 |320.00 |22400.00 |
|11 |4 |2024-09-01 |30 |NULL |NULL |
------------------------------------------------------------------------------------
Explanation:
- The ON clause in the query ensures that the ASOF JOIN correctly matches each sale event with the respective product price based on both the PRODUCT_ID and SALE_DATE.
- If no match is found based on the condition in the ON clause, the right table columns are null-padded. (ex: SALE_ID = 11)
- ASOF joins are similar to left outer joins in this respect.
4. Specifying USAGE instead of ON clause in ASOF Join Queries
Instead of an ON condition, a USING condition can also be used in ASOF JOIN queries. The following query is equivalent to the previous query, but it replaces ON with USING. The syntax USING(product_id) is shorthand for the condition S.product_id = P.product_id.
SELECT
S.sale_id,
S.product_id,
S.sale_date,
S.quantity,
P.price AS price_at_sale_date,
S.quantity * P.price AS total_amount
FROM
SALES S
ASOF JOIN
PRODUCT_PRICES P
MATCH_CONDITION
(S.sale_date >= P.start_date)
USING
(product_id);
Subscribe to our Newsletter !!
Related Articles: