1. Introduction
Snowpark is a developer framework from Snowflake that allows developers to interact with Snowflake directly and build complex data pipelines. In our previous article, we discussed what Snowflake Snowpark is and how to set up a Python development environment for Snowpark.
In Snowpark, while the primary approach for querying and processing data is through DataFrames, there are scenarios where direct SQL execution is advantageous.
When Snowflake objects, such as tables, views, or stages, are already available within the database, SQL statements can be executed directly. This approach provides flexibility for complex queries to express in SQL rather than through DataFrame-style transformations
In this article, let us explore how to execute direct SQL Statements in Snowflake Snowpark and how it complements Snowpark’s DataFrame capabilities.
2. Steps to Execute SQL Statements in Snowflake Snowpark
To execute SQL statements in Snowflake using Snowpark, follow these steps:
2.1. Create a DataFrame with session.sql() Method
The session.sql() method in Snowpark returns a new DataFrame that represents the result set of a given SQL query. This method employs lazy evaluation, which means the SQL query is not executed until an action is triggered on the DataFrame using the DataFrame.collect() method to retrieve the results.
df = session.sql("select current_date")
2.2. Execute the Query Using DataFrame.collect() Method
The DataFrame.collect() method executes the query representing a DataFrame and returns the query result as a list of Row objects.
df.collect()
To execute the query immediately, chain the session.sql() call with the collect() method as follows.
session.sql("select current_date").collect()
2.3. Understanding the Output
The output of the SQL statement executed by the collect() method is stored in a Row object where each Row represents a row in the DataFrame.
[Row(CURRENT_DATE=datetime.date(2024, 11, 3))]
In the above example, the output displays the current date from the query result, encapsulated in a Row
object.
3. Executing DDL Statements in Snowflake Snowpark
In Snowflake Snowpark, the Data Definition Language (DDL) statements such as CREATE
, ALTER
, or DROP
database objects can be executed within your Snowpark session.
The following are some examples of executing DDL statements in Snowflake Snowpark.
session.sql("USE SCHEMA DEMO_DB.PUBLIC").collect()
# Output: [Row(status='Statement executed successfully.')]
session.sql("CREATE OR REPLACE TABLE my_table(ID NUMBER, NAME VARCHAR)").collect()
# Output: [Row(status='Table MY_TABLE successfully created.')]
session.sql("DROP TABLE IF EXISTS my_table").collect()
# Output: [Row(status='MY_TABLE successfully dropped.')]
4. Executing DML Statements in Snowflake Snowpark
In Snowflake Snowpark, the Data Manipulation Language (DML) statements such as INSERT
, UPDATE
, or DELETE
which allows you to manage and modify data within Snowflake tables that can be executed within your Snowpark session.
The following are some examples of executing DML statements in Snowflake Snowpark.
session.sql("INSERT INTO my_table (id, name) VALUES (1, 'TONY')").collect()
# Output: [Row(number of rows inserted=1)]
session.sql("UPDATE my_table SET name = 'TONY S' WHERE id = 1").collect()
# Output: [Row(number of rows updated=1, number of multi-joined rows updated=0)]
session.sql("DELETE FROM my_table WHERE id = 1").collect()
# Output: [Row(number of rows deleted=1)]
5. Executing Parameterized SQL statements in Snowflake Snowpark
In Snowflake Snowpark, the session.sql() method supports parameterized queries through the params argument, allowing you to bind variables directly into your SQL statements.
The following are some examples of executing parameterized DML statements in Snowflake Snowpark.
session.sql("DELETE FROM my_table WHERE id = ?", params=[1]).collect()
# Output: [Row(number of rows deleted=1)]
v_id = 2
v_name = "CHRIS"
session.sql("INSERT INTO my_table (id, name) VALUES (?,?)", params=[v_id, v_name]).collect()
# Output: [Row(number of rows inserted=1)]
The parameter binding is not supported for DDL statements in Snowflake Snowpark.
For DDL statements, you would need to dynamically construct the SQL string outside of session.sql() if you want to incorporate variable elements as shown in the below example.
table_name = "my_table"
drop_query = f"DROP TABLE IF EXISTS {table_name}"
session.sql(drop_query).collect()
# Output: [Row(status='MY_TABLE successfully dropped.')]
6. Accessing Output Results of Executed SQL Statements in Snowflake Snowpark
It is essential to understand how to access the output results after executing the SQL statements in Snowflake Snowpark. The SQL statements are executed using the session.sql() method in conjunction with the DataFrame.collect() method which provides the result as a list of Row objects.
Each Row
object allows you to access individual column values by column name, making it easy to handle and use the output data within your Python code.
The following is the step-by-step guide to accessing the output results of SQL statements executed using Snowflake Snowpark.
6.1. Executing SQL and Storing Results in a Variable
The following example retrieves the count of packages for each language from Snowflake’s information_schema.packages
table and stores the output in the result
variable.
# Example SQL query execution
result = session.sql("SELECT LANGUAGE, COUNT(*) AS RECORD_COUNT FROM information_schema.packages GROUP BY LANGUAGE").collect()
print(result)
# Output: [Row(LANGUAGE='python', RECORD_COUNT=26168), Row(LANGUAGE='java', RECORD_COUNT=22), Row(LANGUAGE='scala', RECORD_COUNT=22)]
6.2. Accessing Individual Rows
Each item in the result
is a Row
object. Each row in the list can be accessed by indexing, such as result[0]
, which refers to the first row in the output.
# Accessing first and second rows
print(result[0])
# Output: Row(LANGUAGE='python', RECORD_COUNT=26168)
print(result[1])
# Output: Row(LANGUAGE='java', RECORD_COUNT=22)
6.3. Accessing Specific Columns in a Row
To retrieve individual column values within each Row
object, you can use dictionary-style indexing. Each column can be accessed by specifying either its name or its position index within the row.
# Access column by Name
print(result[0]['LANGUAGE'])
# Output: python
# Access column by Index
print(result[0][0])
# Output: python
If the query returns multiple rows, each Row and its column values can be accessed by iterating through the result.
# Iterating through list of Row objects
for row in result:
language = row['LANGUAGE']
count = row['RECORD_COUNT']
print(f"Language: {language}, Count: {count}")
# Output:
# Language: python, Count: 26168
# Language: java, Count: 22
# Language: scala, Count: 22
7. Summary
Snowpark allows direct execution of SQL statements using the session.sql() method. To retrieve the actual data, call the DataFrame.collect() method, which provides the result as a list of Row objects.
- Chain the
session.sql()
call with thecollect()
method for immediate query execution. - DDL statements, like creating and dropping tables, and DML statements, like inserting, updating, and deleting records, are all supported.
- Parameterized queries are available for DML statements, enabling secure handling of dynamic values in SQL statements.
- The output from SQL execution is retrieved as a list of
Row
objects, with data accessible by column name or index, allowing flexible data handling.
Subscribe to our Newsletter !!
Related Articles:
- Introduction to Snowflake Snowpark for Python
- HOW TO: Create and Read Data from Snowflake Snowpark DataFrames?
- HOW TO: Write data into Snowflake from a Snowpark DataFrame?
- HOW TO: COPY Data from CSV Files INTO Snowflake Table using Snowpark?
- HOW TO: Add a New Column to a Snowpark DataFrame?
- HOW TO: Drop a Column from a Snowpark DataFrame?
- HOW TO: Remove Duplicates in a Snowflake Snowpark DataFrame?
- HOW TO: Update a DataFrame in Snowflake Snowpark?
- HOW TO: Merge two DataFrames in Snowflake Snowpark?
- HOW TO: Execute SQL Statements in Snowflake Snowpark?
- Aggregate Functions in Snowflake Snowpark
- GROUP BY in Snowflake Snowpark
- Joins in Snowflake Snowpark
- IN Operator in Snowflake Snowpark
- Window Functions in Snowflake Snowpark
- CASE Statement in Snowflake Snowpark
- UDFs in Snowflake Snowpark