1. Introduction
The Table.delete() method in Snowpark helps in deleting the rows of a table. It returns a tuple DeleteResult, representing the number of rows deleted. This method can also be used to delete the rows from a DataFrame.
Syntax
Table.delete([condition], [source])
Parameters
- condition
Represents the specific condition based on which a row should be deleted. If no condition is specified, all the rows of the DataFrame will be deleted.
- source
Represent another DataFrame based on which the data of the current DataFrame should be deleted. The join condition between both the DataFrames should be specified in the condition.
2. Steps to Delete rows from a DataFrame in Snowpark
Follow the below steps to delete rows from a Snowpark DataFrame using Table.delete() method.
- Create a DataFrame with the desired data using Session.createDataFrame().
The DataFrame could be built based on an existing table or data read from a CSV file or content created within the code. - Create a temporary table in Snowflake with the contents of the DataFrame using the save_as_table() method.
- Create a Snowpark Table Object to read the contents of the temporary table using Session.table() method.
- Using the Table.delete() method, delete the contents of the Table object created using a temporary table.
- Finally, display the contents of the Table object to verify that the appropriate records have been deleted using the show() method.
Temporary tables only exist within the session in which they were created and are not visible to other users or sessions. Once the session ends, the table is completely purged from the system. Therefore, temporary tables are well-suited in the scenario of updating DataFrames.
3. Demonstration of Deleting all rows of a DataFrame
STEP-1: Create DataFrame
The following code creates a DataFrame df_emp which holds the EMPLOYEES data as shown below.
# create a DataFrame with employee data
employee_data = [
[1,'TONY',24000,10],
[2,'STEVE',17000,10],
[3,'BRUCE',9000,20],
[4,'WANDA',20000,20]
]
employee_schema = ["EMP_ID", "EMP_NAME", "SALARY", "DEPT_ID"]
df_emp = session.createDataFrame(employee_data, schema=employee_schema)
df_emp.show()
Output:
------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |
------------------------------------------------
|1 |TONY |24000 |10 |
|2 |STEVE |17000 |10 |
|3 |BRUCE |9000 |20 |
|4 |WANDA |20000 |20 |
------------------------------------------------
STEP-2: Create Temporary Table
The following code creates a temporary table named tmp_emp in the Snowflake database using the contents of df_emp DataFrame.
# create a temp table
df_emp.write.mode("overwrite").save_as_table("tmp_emp", table_type="temp")
STEP-3: Read Temporary Table
The following code creates a new DataFrame df_tmp_emp which reads the contents of temporary table tmp_emp.
# create a DataFrame to read contents of temp table
df_tmp_emp = session.table("tmp_emp")
df_tmp_emp.show()
Output:
------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |
------------------------------------------------
|1 |TONY |24000 |10 |
|2 |STEVE |17000 |10 |
|3 |BRUCE |9000 |20 |
|4 |WANDA |20000 |20 |
------------------------------------------------
STEP-4: Delete all rows from a DataFrame
The following code deletes all rows from a Snowpark Table object df_tmp_emp using Table.delete() method.
# delete all rows in a dataframe
df_tmp_emp.delete()
# output: DeleteResult(rows_deleted=4)
STEP-5: Display Updated DataFrame
The following code displays the contents of the updated DataFrame.
# display updated DataFrame
df_tmp_emp.show()
Output:
------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |
------------------------------------------------
| | | | |
------------------------------------------------
4. Deleting rows from a DataFrame based on a Condition
The following code deletes all rows from a DataFrame where department id is 10.
# delete all rows in a dataframe where dept_id = 10
df_tmp_emp.delete(df_tmp_emp["DEPT_ID"] == '10')
# output: DeleteResult(rows_deleted=2)
The output shows all rows belonging to department id 10 are deleted.
df_tmp_emp.show()
------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |
------------------------------------------------
|3 |BRUCE |9000 |20 |
|4 |WANDA |20000 |20 |
------------------------------------------------
5. Deleting rows from a DataFrame based on data in another DataFrame
The rows in a DataFrame can also be deleted based on the data in another DataFrame using Table.delete() method.
The following code creates a new DataFrame df_inactive_emp based on which data in df_tmp_emp DataFrame should be deleted.
# create a new DataFrame based on which data in another DataFrame to be deleted
df_inactive_emp = session.createDataFrame([[1, 10], [3, 20]], ["EMP_ID", "DEPT_ID"])
df_inactive_emp.show()
Output:
------------------------
|"EMP_ID" |"DEPT_ID" |
------------------------
|1 |10 |
|3 |20 |
------------------------
# delete rows in a dataframe based on data in another dataframe
df_tmp_emp.delete(df_tmp_emp["EMP_ID"] == df_inactive_emp.EMP_ID, df_inactive_emp)
# Output: DeleteResult(rows_deleted=2)
df_tmp_emp.show()
Output:
------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |
------------------------------------------------
|3 |BRUCE |9000 |20 |
|4 |WANDA |20000 |20 |
------------------------------------------------
6. Deleting rows from a DataFrame using Session.sql() Method
The Session.sql() method in Snowpark can be used to execute a SQL statement. It returns a new DataFrame representing the results of a SQL query.
Follow the below steps to delete rows from a DataFrame in Snowpark using the Session.sql() method.
- Create a DataFrame with the desired data using Session.createDataFrame(). The DataFrame could be built based on an existing table or data read from a CSV file or content created within the code.
- Create a temporary table with the contents of the DataFrame using the save_as_table method.
- Use the Session.sql() method to delete rows from the temporary table.
- Create a DataFrame to read the contents of the updated temporary table using the session.table() method.
- Display the contents of the DataFrame to verify that the appropriate records have been deleted using the DataFrame.show() method.
# create DataFrame
employee_data = [
[1,'TONY',24000,10],
[2,'STEVE',17000,10],
[3,'BRUCE',9000,20],
[4,'WANDA',20000,20]
]
employee_schema = ["EMP_ID", "EMP_NAME", "SALARY", "DEPT_ID"]
df_emp =session.createDataFrame(employee_data, schema=employee_schema)
# create temporary table
df_emp.write.mode("overwrite").save_as_table("tmp_emp", table_type="temp")
# Delete rows from DataFrame using session.sql()
session.sql("DELETE FROM tmp_emp WHERE EMP_ID = 3").collect()
# Output: [Row(number of rows deleted=1)]
# create DataFrame to read contents of updated temp table
df_tmp_emp = session.table("tmp_emp")
# display updated DataFrame
df_tmp_emp.show()
Output:
------------------------------------------------
|"EMP_ID" |"EMP_NAME" |"SALARY" |"DEPT_ID" |
------------------------------------------------
|1 |TONY |24000 |10 |
|2 |STEVE |17000 |10 |
|4 |WANDA |20000 |20 |
------------------------------------------------
7. Summary
The Table.delete() method can be used to delete rows from a Snowpark DataFrame. Since rows from a DataFrame cannot be deleted directly, a temporary table needs to be created first, and the data from the temporary table should be deleted. The method helps to:
- Delete all rows from a DataFrame.
- Delete rows from a DataFrame based on a condition.
- Delete rows from a DataFrame based on data in another DataFrame.
The Session.sql() method also allows for deleting of rows from a Snowpark DataFrame in similar way, but using SQL statements.
Subscribe to our Newsletter !!
Related Articles:
- Introduction to Snowflake Snowpark for Python
- HOW TO: Create and Read Data from Snowflake Snowpark DataFrames?
- HOW TO: Write data into Snowflake from a Snowpark DataFrame?
- HOW TO: COPY Data from CSV Files INTO Snowflake Table using Snowpark?
- HOW TO: Add a New Column to a Snowpark DataFrame?
- HOW TO: Drop a Column from a Snowpark DataFrame?
- HOW TO: Remove Duplicates in a Snowflake Snowpark DataFrame?
- HOW TO: Update a DataFrame in Snowflake Snowpark?
- HOW TO: Delete Rows From a DataFrame in Snowflake Snowpark?
- HOW TO: Merge two DataFrames in Snowflake Snowpark?
- HOW TO: Execute SQL Statements in Snowflake Snowpark?
- Aggregate Functions in Snowflake Snowpark
- GROUP BY in Snowflake Snowpark
- Joins in Snowflake Snowpark
- IN Operator in Snowflake Snowpark
- Window Functions in Snowflake Snowpark
- CASE Statement in Snowflake Snowpark
- UDFs in Snowflake Snowpark