Search Optimization Service in Snowflake

Spread the love

1. What is Search Optimization Service in Snowflake?

The Search Optimization Service in Snowflake is a query optimization service that can enhance the performance of certain types of lookup and analytical queries which retrieves a small subset of results from large data volumes.

In this article let us discuss how search optimization service improves query performance, identify tables and columns on which search optimization can be enabled and implement it.

Note that Search Optimization Service is an Enterprise Edition (or higher) feature.

2. How Search Optimization Service works in Snowflake?

When Search Optimization Service is enabled on a table, it creates an additional data set called Search Access Path that keeps tracks of all the micro partitions where the values of the table are stored. The Search Access Path improves query performance by reducing the amount of partitions scanned during the table scan operation instead of searching all the partitions of the table.

Illustration of Search Optimization Service in Snowflake
Illustration of Search Optimization Service in Snowflake

The following are the Key points related to search access paths

  • When search optimization is enabled on a table, the process of populating data into search access path for the first time might take significant time depending on the size of the data.
  • When data in the table is modified, the search access path is automatically updated by Snowflake maintenance service.
  • There is additional cost involved for the storage and compute resources for maintaining the search access path.

3. Which queries take benefit of Search Optimization Service?

Search Optimization Service improves the performance of selective point lookup queries on tables that use

  • Equality or IN Predicates
    • <column_name> = <constant>
    • <column_name> IN (<constant>,<constant>,…)
  • Substrings and Regular Expressions
    • LIKE
    • RLIKE
    • REGEXP
    • REGEXP_LIKE …
  • Fields in VARIANT Columns
  • Geospatial Functions with GEOGRAPHY objects
  • Conjunctions of above Supported Predicates (AND)
    • where <condition_x> and <condition_y>
  • Disjunctions of above Supported Predicates (OR)
    • where <condition_x> or <condition_y>

4. How to identify tables and columns that benefit from Search Optimization Service?

The following are the recommended checks to consider a table and its columns for Search Optimization Service.

  • The data volume of the table on which the query is executed is typically at least in few hundreds of GBs.
  • The columns used in the query filter operation has at least 100K-200K distinct values.
  • The query returns only a few rows with highly selective filters.
  • The query typically runs for at least few seconds or longer.
  • The table is either not clustered or the table is frequently queried on columns other than the cluster key columns.

5. How to enable Search Optimization Service for a table in Snowflake?

The Search Optimization Service can be enabled on a table using ALTER TABLE … ADD SEARCH OPTIMIZATION command with or without the ON clause.

The columns on which search optimization to be enabled is specified in the ON clause along with a search method (ex: EQUALITY for equality and IN searches, SUBSTRING for substring searches and GEO for GEOGRAPHY searches.)

The following SQL statement enables search optimization on table t1 with equality search filters on columns c1, c2.

ALTER TABLE t1 ADD SEARCH OPTIMIZATION ON EQUALITY(c1, c2);

The following SQL statement enables search optimization on table t1 with substring searches on columns c3.

ALTER TABLE t1 ADD SEARCH OPTIMIZATION ON SUBSTRING(c3);

Multiple search methods can be configured for a table in the ON clause separated by commas.

The following SQL statement enables search optimization on table t1 with equality search filters on columns c1, c2 and substring searches on columns c3.

ALTER TABLE t1 ADD SEARCH OPTIMIZATION ON EQUALITY(c1, c2), SUBSTRING(c3);

Enabling the search optimization on a table without ON clause sets up search access paths using EQUALITY search method for all supported columns.

The following SQL statement enables search optimization on table t1 with equality filters on all supported columns.

ALTER TABLE t1 ADD SEARCH OPTIMIZATION;

6. How to identify if Search Optimization is configured on a table?

The following command can be used to identify if search optimization is configured on a table.

SHOW TABLES;

For the tables where search optimization is configured,

  • The value of the field search_optimization will be ON.
  • The search_optimization_progress field indicates the status of creation of search access path.
  • The search_optimization_bytes field indicates the amount of storage consumed by the search access path built for the table.

7. Demonstration of Search Optimization Service

For the demonstration, let us create two copies of same table. On one table, the Search Optimization Service will be enabled on some columns but not on the other table. We will then compare the query performance on both the tables.

7.1. Setting up tables for demonstration

CREATE DATABASE DEMO_DB;

--creating table from snowflake sample data
CREATE TABLE DEMO_DB.PUBLIC.LINEITEM_SOS
AS
SELECT * FROM
SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000.LINEITEM;

--creating copy of the table created from snowflake sample data
CREATE TABLE DEMO_DB.PUBLIC.LINEITEM_NON_SOS
CLONE DEMO_DB.PUBLIC.LINEITEM_SOS;

7.2. Enabling Search Optimization on a table

Configuring search optimization on table LINEITEM_SOS for equality search filters on column l_order_key.

ALTER TABLE DEMO_DB.PUBLIC.LINEITEM_SOS ADD SEARCH OPTIMIZATION ON EQUALITY(l_orderkey);

7.3. Verify the Search Optimization Progress

When search optimization is configured on a table, the maintenance service starts creating the search access path for the table based on the column configured. This might take some time based on the volume of data.

Verify the search optimization progress using SHOW TABLES command as shown below.

SHOW TABLES LIKE 'LINEITEM%';
Verifying the Search Optimization

7.4. Running same query on tables without and with Search Optimization Service enabled

Running query on LINEITEM_NON_SOS table with no search optimization configured with a filter on L_ORDER_KEY field.

SELECT * FROM DEMO_DB.PUBLIC.LINEITEM_NON_SOS WHERE L_ORDERKEY = '411832961';
Query without Search Optimization
Query without Search Optimization

Running a query on LINEITEM_SOS table which is configured for search optimization with same filter on L_ORDER_KEY field.

SELECT * FROM DEMO_DB.PUBLIC.LINEITEM_SOS WHERE L_ORDERKEY = '411832961';
Query with Search Optimization
Query with Search Optimization

7.5. Comparing the stats of the queries

The below images shows the query profile of both queries that ran without and with Search Optimization Service enabled.

  • There is a huge improvement in the query runtime from 63 seconds to 2.3 seconds with search optimization enabled.
  • In the query with search optimization not enabled, almost 99% of the partitions scanned by the query. On the other hand, the query with search optimization enabled scanned only 7 partitions out of 9317 partitions which led to massive improvement in the performance and execution time.
  • The number of bytes scanned by the query dropped from 24.47 GB to 91.07 MB with search optimization enabled.

Without Search Optimization

With Search Optimization Enabled

8. How to identify which columns are configured for Search Optimization in a table?

To identify the columns and their search optimization configuration in a table, use the DESCRIBE SEARCH OPTIMIZATION command.

The following SQL statement provides the search optimization configuration of table LINEITEM_SOS.

DESCRIBE SEARCH OPTIMIZATION ON DEMO_DB.PUBLIC.LINEITEM_SOS;
Identifying columns configured for Search Optimization
Identifying columns configured for Search Optimization

The command returns a table with list of fields configured for search optimization, their search method and data type. The active fields indicates if the initial build of the search access path is finished.

9. How to Disable Search Optimization Service in Snowflake?

The Search Optimization Service can be disabled on a table using ALTER TABLE … DROP SEARCH OPTIMIZATION command.

The following SQL statement removes search optimization on table t1 with substring searches on columns c1.

ALTER TABLE t1 DROP SEARCH OPTIMIZATION ON SUBSTRING(c1);

The following SQL statement removes all search optimization methods configured on columns c1.

ALTER TABLE t1 DROP SEARCH OPTIMIZATION ON c1;

The following SQL statement removes all search optimization methods configured on all columns of table t1.

ALTER TABLE t1 DROP SEARCH OPTIMIZATION;

10. How to estimate billing costs of Search Optimization Service in Snowflake?

The Search Optimization Service in Snowflake incurs costs for both storage and compute resources.

  • The Search Access Paths created by search optimization service on a table consumes storage and the storage cost depends on multiple factors like number of columns on which search optimization is configured and the number of distinct value of each column.
  • Compute resources are utilized for the initial build of the search access paths when search optimization is enabled. Maintaining the search optimization service when data is modified in the table also requires resources.

To estimate the cost of adding search optimization to a table and configuring specific columns for search optimization, use the SYSTEM$ESTIMATE_SEARCH_OPTIMIZATION_COSTS function.

The following statement shows the estimated costs of adding search optimization to a table.

SELECT SYSTEM$ESTIMATE_SEARCH_OPTIMIZATION_COSTS('DEMO_DB.PUBLIC.LINEITEM_NON_SOS') AS SOS_ESTIMATE;
Cost estimate for enabling Search Optimization on table
Cost estimate for enabling Search Optimization on a table

The following statement shows the estimated costs of adding search optimization to a specific column of a table using EQUALITY search method.

SELECT SYSTEM$ESTIMATE_SEARCH_OPTIMIZATION_COSTS('DEMO_DB.PUBLIC.LINEITEM_NON_SOS','EQUALITY(L_ORDERKEY)') AS SOS_ESTIMATE;
Cost estimate for enabling Search Optimization with Equality search method on a column
Cost estimate for enabling Search Optimization with Equality search method on a column

Subscribe to our Newsletter !!

Related Articles:

  • Snowflake User-Defined Functions (UDFs)

    UDF is a reusable component defined by user to perform a specific task which can be called from a SQL statement.

    READ MORE

  • Caching in Snowflake

    Learn how Snowflake caching helps in improving the query performance by making data retrieval faster and more efficient.

    READ MORE

  • Query Acceleration Service in Snowflake

    Query Acceleration Service is a feature to improve the performance of huge queries by offloading portions of query processing to the additional compute resources.

    READ MORE

Leave a Comment

Related Posts