HOW TO: Execute SQL Statements in Snowflake Snowpark?

Spread the love

1. Introduction

Snowpark is a developer framework from Snowflake that allows developers to interact with Snowflake directly and build complex data pipelines. In our previous article, we discussed what Snowflake Snowpark is and how to set up a Python development environment for Snowpark.

In Snowpark, while the primary approach for querying and processing data is through DataFrames, there are scenarios where direct SQL execution is advantageous.

When Snowflake objects, such as tables, views, or stages, are already available within the database, SQL statements can be executed directly. This approach provides flexibility for complex queries to express in SQL rather than through DataFrame-style transformations

In this article, let us explore how to execute direct SQL Statements in Snowflake Snowpark and how it complements Snowpark’s DataFrame capabilities.

2. Steps to Execute SQL Statements in Snowflake Snowpark

To execute SQL statements in Snowflake using Snowpark, follow these steps:

2.1. Create a DataFrame with session.sql() Method

The session.sql() method in Snowpark returns a new DataFrame that represents the result set of a given SQL query. This method employs lazy evaluation, which means the SQL query is not executed until an action is triggered on the DataFrame using the DataFrame.collect() method to retrieve the results.

df = session.sql("select current_date")

2.2. Execute the Query Using DataFrame.collect() Method

The DataFrame.collect() method executes the query representing a DataFrame and returns the query result as a list of Row objects.

df.collect()

To execute the query immediately, chain the session.sql() call with the collect() method as follows.

session.sql("select current_date").collect()

2.3. Understanding the Output

The output of the SQL statement executed by the collect() method is stored in a Row object where each Row represents a row in the DataFrame.

[Row(CURRENT_DATE=datetime.date(2024, 11, 3))]

In the above example, the output displays the current date from the query result, encapsulated in a Row object.

3. Executing DDL Statements in Snowflake Snowpark

In Snowflake Snowpark, the Data Definition Language (DDL) statements such as CREATE, ALTER, or DROP database objects can be executed within your Snowpark session.

The following are some examples of executing DDL statements in Snowflake Snowpark.

session.sql("USE SCHEMA DEMO_DB.PUBLIC").collect()
# Output: [Row(status='Statement executed successfully.')]

session.sql("CREATE OR REPLACE TABLE my_table(ID NUMBER, NAME VARCHAR)").collect()
# Output: [Row(status='Table MY_TABLE successfully created.')]

session.sql("DROP TABLE IF EXISTS my_table").collect()
# Output: [Row(status='MY_TABLE successfully dropped.')]

4. Executing DML Statements in Snowflake Snowpark

In Snowflake Snowpark, the Data Manipulation Language (DML) statements such as INSERT, UPDATE, or DELETE which allows you to manage and modify data within Snowflake tables that can be executed within your Snowpark session.

The following are some examples of executing DML statements in Snowflake Snowpark.

session.sql("INSERT INTO my_table (id, name) VALUES (1, 'TONY')").collect()
# Output: [Row(number of rows inserted=1)]

session.sql("UPDATE my_table SET name = 'TONY S' WHERE id = 1").collect()
# Output: [Row(number of rows updated=1, number of multi-joined rows updated=0)]

session.sql("DELETE FROM my_table WHERE id = 1").collect()
# Output: [Row(number of rows deleted=1)]

5. Executing Parameterized SQL statements in Snowflake Snowpark

In Snowflake Snowpark, the session.sql() method supports parameterized queries through the params argument, allowing you to bind variables directly into your SQL statements.

The following are some examples of executing parameterized DML statements in Snowflake Snowpark.

session.sql("DELETE FROM my_table WHERE id = ?", params=[1]).collect()
# Output: [Row(number of rows deleted=1)]

v_id = 2
v_name = "CHRIS"
session.sql("INSERT INTO my_table (id, name) VALUES (?,?)", params=[v_id, v_name]).collect()
# Output: [Row(number of rows inserted=1)]

The parameter binding is not supported for DDL statements in Snowflake Snowpark.

For DDL statements, you would need to dynamically construct the SQL string outside of session.sql() if you want to incorporate variable elements as shown in the below example.

table_name = "my_table"
drop_query = f"DROP TABLE IF EXISTS {table_name}"

session.sql(drop_query).collect()
# Output: [Row(status='MY_TABLE successfully dropped.')]

6. Accessing Output Results of Executed SQL Statements in Snowflake Snowpark

It is essential to understand how to access the output results after executing the SQL statements in Snowflake Snowpark. The SQL statements are executed using the session.sql() method in conjunction with the DataFrame.collect() method which provides the result as a list of Row objects.

Each Row object allows you to access individual column values by column name, making it easy to handle and use the output data within your Python code.

The following is the step-by-step guide to accessing the output results of SQL statements executed using Snowflake Snowpark.

6.1. Executing SQL and Storing Results in a Variable

The following example retrieves the count of packages for each language from Snowflake’s information_schema.packages table and stores the output in the result variable.

# Example SQL query execution
result = session.sql("SELECT LANGUAGE, COUNT(*) AS RECORD_COUNT FROM information_schema.packages GROUP BY LANGUAGE").collect()

print(result)
# Output: [Row(LANGUAGE='python', RECORD_COUNT=26168), Row(LANGUAGE='java', RECORD_COUNT=22), Row(LANGUAGE='scala', RECORD_COUNT=22)]

6.2. Accessing Individual Rows

Each item in the result is a Row object. Each row in the list can be accessed by indexing, such as result[0], which refers to the first row in the output.

# Accessing first and second rows
print(result[0])
# Output: Row(LANGUAGE='python', RECORD_COUNT=26168)

print(result[1])
# Output: Row(LANGUAGE='java', RECORD_COUNT=22)

6.3. Accessing Specific Columns in a Row

To retrieve individual column values within each Row object, you can use dictionary-style indexing. Each column can be accessed by specifying either its name or its position index within the row.

# Access column by Name
print(result[0]['LANGUAGE'])
# Output: python

# Access column by Index
print(result[0][0])
# Output: python

If the query returns multiple rows, each Row and its column values can be accessed by iterating through the result.

# Iterating through list of Row objects
for row in result:
    language = row['LANGUAGE']
    count = row['RECORD_COUNT']
    print(f"Language: {language}, Count: {count}")

# Output:
# Language: python, Count: 26168
# Language: java, Count: 22
# Language: scala, Count: 22

7. Summary

Snowpark allows direct execution of SQL statements using the session.sql() method. To retrieve the actual data, call the DataFrame.collect() method, which provides the result as a list of Row objects.

  • Chain the session.sql() call with the collect() method for immediate query execution.
  • DDL statements, like creating and dropping tables, and DML statements, like inserting, updating, and deleting records, are all supported.
  • Parameterized queries are available for DML statements, enabling secure handling of dynamic values in SQL statements.
  • The output from SQL execution is retrieved as a list of Row objects, with data accessible by column name or index, allowing flexible data handling.

Subscribe to our Newsletter !!

Related Articles:

Leave a Comment

Related Posts