HOW TO: Add a New Column to a Snowpark DataFrame?

Spread the love

Introduction

There are multiple ways to add a new column to a Snowpark DataFrame. In this article, we will explore how to achieve this using the following DataFrame methods:

  • withColumn()
  • select()
  • sql()

Consider the following DataFrame as the source for demonstrating different methods to add a new column.

#// Setting up data for Demonstration //
data = [[1,100],
        [2,400],
        [3,300],
        [4,200]]

df = session.createDataFrame(data, schema=["ID","SALARY"])
df.show()
-------------------
|"ID"  |"SALARY"  |
-------------------
|1     |100       |
|2     |400       |
|3     |300       |
|4     |200       |
-------------------

1. Adding a New Column to a Snowpark DataFrame using withColumn() Method

The DataFrame.withColumn() method returns a DataFrame with a newly added column with the specified name. If a column with the same name already exists in the DataFrame, it is replaced by the new column.

The following code demonstrates adding a new column named “BONUS” to an existing DataFrame “df” using the withColumn() method.

  • The values of column BONUS are based on the existing column named “SALARY”.
  • The expression col(“SALARY”) * 0.1 specifies the bonus amount is 10% of the salary value.
#// Add Column to a DataFrame using withColumn() //

from snowflake.snowpark.functions import col

df.withColumn("BONUS", col("SALARY")*0.1).show()
-----------------------------
|"ID"  |"SALARY"  |"BONUS"  |
-----------------------------
|1     |100       |10.0     |
|2     |400       |40.0     |
|3     |300       |30.0     |
|4     |200       |20.0     |
-----------------------------

2. Adding a New Column with a Default Value to a Snowpark DataFrame

The “lit()” function in Snowpark is used to create a column with a constant literal value. It is useful for adding a column to a DataFrame where each row contains the same fixed value.

The following code demonstrates adding a new column named “BONUS” with a default value to an existing DataFrame “df” using the lit() function.

#// Add Column with a Default Value //

from snowflake.snowpark.functions import lit

df.withColumn("BONUS", lit(10)).show()
-----------------------------
|"ID"  |"SALARY"  |"BONUS"  |
-----------------------------
|1     |100       |10.0     |
|2     |400       |10.0     |
|3     |300       |10.0     |
|4     |200       |10.0     |
-----------------------------

3. Adding a New Column with Value based on a Condition to a Snowpark DataFrame

A CASE statement in Snowflake Snowpark can be constructed using the “when()” function to specify conditions and their corresponding results. The “otherwise()” method is used to define the result when none of the specified conditions are met.

The following code demonstrates adding a new column named “BONUS” with values based on a condition to an existing DataFrame “df”.

#// Add Column Value Based on Condition //

from snowflake.snowpark.functions import col, lit, when

df.withColumn("BONUS",
              when(col("SALARY") <= 100, lit('15'))
             .when((col("SALARY") > 100) & (col("SALARY") <= 200), lit('12'))
             .otherwise(lit('10'))
             ).show()

-----------------------------
|"ID"  |"SALARY"  |"BONUS"  |
-----------------------------
|1     |100       |15       |
|2     |400       |10       |
|3     |300       |10       |
|4     |200       |12       |
-----------------------------

4. Adding Multiples Columns to a Snowpark DataFrame

Multiple columns can be added to a Snowpark DataFrame using the withColumn() method where you typically need to call it separately for each column you want to add.

The following code demonstrates adding multiple columns to an existing DataFrame “df” using the withColumn() method.

#// Add Multiple Columns to a DataFrame //

from snowflake.snowpark.functions import col, lit

df.withColumn("BONUS", col("SALARY")*0.1).withColumn("DEPT", lit(100)).show()
--------------------------------------
|"ID"  |"SALARY"  |"BONUS"  |"DEPT"  |
--------------------------------------
|1     |100       |10.0     |100     |
|2     |400       |40.0     |100     |
|3     |300       |30.0     |100     |
|4     |200       |20.0     |100     |
--------------------------------------

5. Adding a New Column to a Snowpark DataFrame using select() Method

The “DataFrame.select()” method in Snowpark is similar to the SELECT statement in SQL and returns a new DataFrame with only the specified column expressions as output. Unlike the “withcolumn()” method, only the columns specified as arguments will be returned in the resulting DataFrame.

The following code demonstrates adding a new column named “BONUS” to an existing DataFrame “df” using the select() method.

#// Add Column to a DataFrame using select() //

from snowflake.snowpark.functions import col

df.select(col("ID"), col("SALARY"), (col("SALARY")*0.1).alias("BONUS")).show()
-----------------------------
|"ID"  |"SALARY"  |"BONUS"  |
-----------------------------
|1     |100       |10.0     |
|2     |400       |40.0     |
|3     |300       |30.0     |
|4     |200       |20.0     |
-----------------------------

6. Adding a New Column to a Snowpark DataFrame using sql() Method

The “Session.sql()” method returns a new DataFrame representing the results of a SQL query. This approach allows you to leverage SQL syntax for querying and manipulating data in Snowpark DataFrames by creating a temporary view.

The following code demonstrates adding a new column named “BONUS” to an existing DataFrame “df” using the sql() method.

  • The first step is to create a temporary view that makes the DataFrame “df” accessible for SQL queries under the name “EMP”. A temporary view persists only for the duration of the session
  • The next step is to run a SQL query to retrieve the ID, SALARY, and a computed BONUS (10% of SALARY) from the temporary view, and then display the results.
#// Create Temporary View with the contents of DataFrame //
df.createOrReplaceTempView("EMP")
//[Row(status='View EMP successfully created.')]

#// Add Column to DataFrame using SQL //
session.sql("SELECT ID, SALARY, SALARY*0.1 AS BONUS FROM EMP").show()
-----------------------------
|"ID"  |"SALARY"  |"BONUS"  |
-----------------------------
|1     |100       |10.0     |
|2     |400       |40.0     |
|3     |300       |30.0     |
|4     |200       |20.0     |
-----------------------------

Conclusion

In this article, you have learned multiple ways to add a new column to a Snowpark DataFrame that includes adding a column with default value, based on the existing column, based on a condition, and adding multiple columns with examples.

Subscribe to our Newsletter !!

Related Articles:

Leave a Comment

Related Posts