MIN_BY and MAX_BY Functions in Snowflake

Spread the love

MIN_BY and MAX_BY Functions in Snowflake

MIN_BY and MAX_BY are aggregate functions in Snowflake that calculate the minimum or maximum value of a column and return the value of another column in that row.

Syntax

MIN_BY( <column_1>, <column_2> [ , <maximum_number_of_values_to_return> ] )
MAX_BY( <column_1>, <column_2> [ , <maximum_number_of_values_to_return> ] )

Arguments

Requiredcolumn_1Column whose value is to be returned
column_2Column on which the min or max aggregation is performed
Optionalmaximum_number_of_values_to_returnConstant number specifying the maximum values to be returned.

Implementation and Use Cases with examples

Let us understand the implementation and use cases of these functions using the following sample SALES data set.

----------------------------------------------
|"PRODUCT_ID"  |"SALE_DATE"  |"SALE_AMOUNT"  |
----------------------------------------------
|101           |2024-01-01   |200.00         |
|101           |2024-01-02   |150.00         |
|101           |2024-01-03   |220.00         |
|102           |2024-01-02   |190.00         |
|102           |2024-01-03   |155.00         |
|102           |2024-01-04   |160.00         |
|103           |2024-01-03   |300.00         |
|103           |2024-01-04   |330.00         |
|103           |2024-01-05   |310.00         |
----------------------------------------------

For example, if we wanted to calculate the value of the least sale amount, we could use the MIN(SALE_AMOUNT) expression to calculate the value. But if, instead, we wanted to return the SALE_DATE on which the least sale was registered, it would be a multi-step process, first to calculate the minimum sale amount, and then to find the date based on that minimum value.

MIN_BY and MAX_BY functions are designed to handle exactly such scenarios, simplifying the process by avoiding the multi-steps involved.  

Example-1:

The following SQL statement returns the SALE_DATE on which the least sale was registered.

SELECT MIN_BY(SALE_DATE,SALE_AMOUNT) FROM SALES;

-----------------------------------
|"MIN_BY(SALE_DATE,SALE_AMOUNT)"  |
-----------------------------------
|2024-01-02                       |
-----------------------------------
Least sale across all dates and products
Least sale across all dates and products

A typical query involves a self join and multiple steps to calculate the same value as shown below.

WITH CTE AS(
    SELECT MIN(SALE_AMOUNT) AS MIN_SALE_AMOUNT FROM SALES
)
SELECT SALE_DATE FROM SALES a
JOIN CTE b ON a.SALE_AMOUNT = b.MIN_SALE_AMOUNT
;

Note that if there are multiple dates on which the least sale occurred, the MIN_BY query would still return a single value. The returned value might be non-deterministic and could vary between executions.

Example-2:

Additionally, we can also include the number of values to be returned using the MIN_BY and MAX_BY functions.

The following SQL statement returns the two SALE_DATE values on which the least sales were registered.

SELECT MIN_BY(SALE_DATE,SALE_AMOUNT,2) FROM SALES;

-------------------------------------
|"MIN_BY(SALE_DATE,SALE_AMOUNT,2)"  |
-------------------------------------
|[                                  |
"2024-01-02",                    |
"2024-01-03"                     |
|]                                  |
-------------------------------------
Least two sales
Least two sales

Note that the query returned an array containing multiple dates, ordered by their sale values. If there are multiple dates on which the least sale occurred, the ordering of values will be non-deterministic in the return value.

Example-3:

The following SQL statement returns the SALE_DATE corresponding to the least and the highest SALE_AMOUNT values.

SELECT MIN_BY(SALE_DATE,SALE_AMOUNT), MAX_BY(SALE_DATE,SALE_AMOUNT) FROM SALES;

-----------------------------------------------------------------------
|"MIN_BY(SALE_DATE, SALE_AMOUNT)"  |"MAX_BY(SALE_DATE, SALE_AMOUNT)"  |
-----------------------------------------------------------------------
|2024-01-02                        |2024-01-04                        |
-----------------------------------------------------------------------

MIN_BY and MAX_BY functions in GROUP BY Clause

Similar to the MIN and MAX functions, the MIN_BY and MAX_BY functions can be used in conjunction with the GROUP BY clause to determine the minimum and maximum values of one column based on another column within each group of data.

The following SQL statement returns the latest sale amount of each product.

-- Calculates the latest sales amount of each product
SELECT
    PRODUCT_ID,
    MAX_BY(SALE_AMOUNT,SALE_DATE) AS LATEST_SALE_AMOUNT
FROM SALES
GROUP BY PRODUCT_ID
;

---------------------------------------
|"PRODUCT_ID"  |"LATEST_SALE_AMOUNT"  |
---------------------------------------
|101           |220.00                |
|102           |160.00                |
|103           |310.00                |
---------------------------------------
Latest sale amount of each product
Latest sale amount of each product

MIN and MAX functions can be used in conjunction with MIN_BY and MAX_BY functions in Snowflake. While MIN and MAX functions calculate the minimum and maximum values directly, MIN_BY and MAX_BY allow you to retrieve associated values from other columns based on these extremes, providing a more comprehensive view of your data.

The following SQL statement return the latest sale amount of each product along with the date on which sale is registered.

-- Calculates the latest sales amount of each product along with the sales date
SELECT
    PRODUCT_ID,
    MAX(SALE_DATE) AS LATEST_SALE_DATE,
    MAX_BY(SALE_AMOUNT,SALE_DATE) AS LATEST_SALE_AMOUNT
FROM SALES
GROUP BY PRODUCT_ID;

------------------------------------------------------------
|"PRODUCT_ID"  |"LATEST_SALE_DATE"  |"LATEST_SALE_AMOUNT"  |
------------------------------------------------------------
|101           |2024-01-03          |220.00                |
|102           |2024-01-04          |160.00                |
|103           |2024-01-05          |310.00                |
------------------------------------------------------------

MIN_BY and MAX_BY Analytical functions

MIN_BY and MAX_BY functions can be used in conjunction with analytical functions (window functions) in Snowflake. By combining them with a window function, you can perform row-wise computations across partitions of data, rather than just aggregating results across groups.

The following SQL statement calculates the latest sale amount for each product and represents this value against each record of the product.

SELECT
    PRODUCT_ID, SALE_DATE, SALE_AMOUNT,
    MAX_BY(SALE_AMOUNT,SALE_DATE) OVER(PARTITION BY PRODUCT_ID) AS LATEST_SALE_AMOUNT
FROM SALES
ORDER BY PRODUCT_ID, SALE_DATE;

---------------------------------------------------------------------
|"PRODUCT_ID"  |"SALE_DATE"  |"SALE_AMOUNT"  |"LATEST_SALE_AMOUNT"  |
---------------------------------------------------------------------
|101           |2024-01-01   |200.00         |220.00                |
|101           |2024-01-02   |150.00         |220.00                |
|101           |2024-01-03   |220.00         |220.00                |
|102           |2024-01-02   |190.00         |160.00                |
|102           |2024-01-03   |155.00         |160.00                |
|102           |2024-01-04   |160.00         |160.00                |
|103           |2024-01-03   |300.00         |310.00                |
|103           |2024-01-04   |330.00         |310.00                |
|103           |2024-01-05   |310.00         |310.00                |
---------------------------------------------------------------------

Subscribe to our Newsletter !!

Related Articles:

Leave a Comment

Related Posts