Introduction
There are multiple ways to add a new column to a Snowpark DataFrame. In this article, we will explore how to achieve this using the following DataFrame methods:
- withColumn()
- select()
- sql()
Consider the following DataFrame as the source for demonstrating different methods to add a new column.
#// Setting up data for Demonstration //
data = [[1,100],
[2,400],
[3,300],
[4,200]]
df = session.createDataFrame(data, schema=["ID","SALARY"])
df.show()
-------------------
|"ID" |"SALARY" |
-------------------
|1 |100 |
|2 |400 |
|3 |300 |
|4 |200 |
-------------------
1. Adding a New Column to a Snowpark DataFrame using withColumn() Method
The DataFrame.withColumn() method returns a DataFrame with a newly added column with the specified name. If a column with the same name already exists in the DataFrame, it is replaced by the new column.
The following code demonstrates adding a new column named “BONUS” to an existing DataFrame “df” using the withColumn() method.
- The values of column BONUS are based on the existing column named “SALARY”.
- The expression col(“SALARY”) * 0.1 specifies the bonus amount is 10% of the salary value.
#// Add Column to a DataFrame using withColumn() //
from snowflake.snowpark.functions import col
df.withColumn("BONUS", col("SALARY")*0.1).show()
-----------------------------
|"ID" |"SALARY" |"BONUS" |
-----------------------------
|1 |100 |10.0 |
|2 |400 |40.0 |
|3 |300 |30.0 |
|4 |200 |20.0 |
-----------------------------
2. Adding a New Column with a Default Value to a Snowpark DataFrame
The “lit()” function in Snowpark is used to create a column with a constant literal value. It is useful for adding a column to a DataFrame where each row contains the same fixed value.
The following code demonstrates adding a new column named “BONUS” with a default value to an existing DataFrame “df” using the lit() function.
#// Add Column with a Default Value //
from snowflake.snowpark.functions import lit
df.withColumn("BONUS", lit(10)).show()
-----------------------------
|"ID" |"SALARY" |"BONUS" |
-----------------------------
|1 |100 |10.0 |
|2 |400 |10.0 |
|3 |300 |10.0 |
|4 |200 |10.0 |
-----------------------------
3. Adding a New Column with Value based on a Condition to a Snowpark DataFrame
A CASE statement in Snowflake Snowpark can be constructed using the “when()” function to specify conditions and their corresponding results. The “otherwise()” method is used to define the result when none of the specified conditions are met.
The following code demonstrates adding a new column named “BONUS” with values based on a condition to an existing DataFrame “df”.
#// Add Column Value Based on Condition //
from snowflake.snowpark.functions import col, lit, when
df.withColumn("BONUS",
when(col("SALARY") <= 100, lit('15'))
.when((col("SALARY") > 100) & (col("SALARY") <= 200), lit('12'))
.otherwise(lit('10'))
).show()
-----------------------------
|"ID" |"SALARY" |"BONUS" |
-----------------------------
|1 |100 |15 |
|2 |400 |10 |
|3 |300 |10 |
|4 |200 |12 |
-----------------------------
4. Adding Multiples Columns to a Snowpark DataFrame
Multiple columns can be added to a Snowpark DataFrame using the withColumn() method where you typically need to call it separately for each column you want to add.
The following code demonstrates adding multiple columns to an existing DataFrame “df” using the withColumn() method.
#// Add Multiple Columns to a DataFrame //
from snowflake.snowpark.functions import col, lit
df.withColumn("BONUS", col("SALARY")*0.1).withColumn("DEPT", lit(100)).show()
--------------------------------------
|"ID" |"SALARY" |"BONUS" |"DEPT" |
--------------------------------------
|1 |100 |10.0 |100 |
|2 |400 |40.0 |100 |
|3 |300 |30.0 |100 |
|4 |200 |20.0 |100 |
--------------------------------------
5. Adding a New Column to a Snowpark DataFrame using select() Method
The “DataFrame.select()” method in Snowpark is similar to the SELECT statement in SQL and returns a new DataFrame with only the specified column expressions as output. Unlike the “withcolumn()” method, only the columns specified as arguments will be returned in the resulting DataFrame.
The following code demonstrates adding a new column named “BONUS” to an existing DataFrame “df” using the select() method.
#// Add Column to a DataFrame using select() //
from snowflake.snowpark.functions import col
df.select(col("ID"), col("SALARY"), (col("SALARY")*0.1).alias("BONUS")).show()
-----------------------------
|"ID" |"SALARY" |"BONUS" |
-----------------------------
|1 |100 |10.0 |
|2 |400 |40.0 |
|3 |300 |30.0 |
|4 |200 |20.0 |
-----------------------------
6. Adding a New Column to a Snowpark DataFrame using sql() Method
The “Session.sql()” method returns a new DataFrame representing the results of a SQL query. This approach allows you to leverage SQL syntax for querying and manipulating data in Snowpark DataFrames by creating a temporary view.
The following code demonstrates adding a new column named “BONUS” to an existing DataFrame “df” using the sql() method.
- The first step is to create a temporary view that makes the DataFrame “df” accessible for SQL queries under the name “EMP”. A temporary view persists only for the duration of the session
- The next step is to run a SQL query to retrieve the ID, SALARY, and a computed BONUS (10% of SALARY) from the temporary view, and then display the results.
#// Create Temporary View with the contents of DataFrame //
df.createOrReplaceTempView("EMP")
//[Row(status='View EMP successfully created.')]
#// Add Column to DataFrame using SQL //
session.sql("SELECT ID, SALARY, SALARY*0.1 AS BONUS FROM EMP").show()
-----------------------------
|"ID" |"SALARY" |"BONUS" |
-----------------------------
|1 |100 |10.0 |
|2 |400 |40.0 |
|3 |300 |30.0 |
|4 |200 |20.0 |
-----------------------------
Conclusion
In this article, you have learned multiple ways to add a new column to a Snowpark DataFrame that includes adding a column with default value, based on the existing column, based on a condition, and adding multiple columns with examples.
Subscribe to our Newsletter !!
Related Articles:
- Introduction to Snowflake Snowpark for Python
- HOW TO: Create and Read Data from Snowflake Snowpark DataFrames?
- HOW TO: Write data into Snowflake from a Snowpark DataFrame?
- HOW TO: COPY Data from CSV Files INTO Snowflake Table using Snowpark?
- HOW TO: Add a New Column to a Snowpark DataFrame?
- HOW TO: Drop a Column from a Snowpark DataFrame?
- HOW TO: Remove Duplicates in a Snowflake Snowpark DataFrame?
- HOW TO: Update a DataFrame in Snowflake Snowpark?
- HOW TO: Merge two DataFrames in Snowflake Snowpark?
- HOW TO: Execute SQL Statements in Snowflake Snowpark?
- Aggregate Functions in Snowflake Snowpark
- GROUP BY in Snowflake Snowpark
- Joins in Snowflake Snowpark
- IN Operator in Snowflake Snowpark
- Window Functions in Snowflake Snowpark
- CASE Statement in Snowflake Snowpark
- UDFs in Snowflake Snowpark