Handling NULL ValuesΒΆ
Let us understand how to handle nulls using specific functions in Spark SQL.
Let us start spark context for this Notebook so that we can execute the code provided. You can sign up for our 10 node state of the art cluster/labs to learn Spark SQL using our unique integrated LMS.
val username = System.getProperty("user.name")
import org.apache.spark.sql.SparkSession
val username = System.getProperty("user.name")
val spark = SparkSession.
builder.
config("spark.ui.port", "0").
config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
enableHiveSupport.
appName(s"${username} | Spark SQL - Predefined Functions").
master("yarn").
getOrCreate
If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.
Using Spark SQL
spark2-sql \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Using Scala
spark2-shell \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Using Pyspark
pyspark2 \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
By default if we try to add or concatenate null to another column or expression or literal, it will return null.
If we want to replace null with some default value, we can use
nvl
. For not null values, nvl returns the original expression value.Replace commission_pct with 0 if it is null.
We can also use
coalesce
in the place ofnvl
.
coalesce
returns first not null value if we pass multiple arguments to it.nvl2
can be used to perform one action when the value is not null and some other action when the value is null.We want to increase commission_pct by 1 if it is not null and set commission_pct to 2 if it is null.
We can also use
CASE WHEN ELSE END
for any conditional logic.
%%sql
SELECT 1 + NULL AS result
%%sql
SELECT concat('Hello', NULL) AS result
%%sql
SELECT nvl(1, 0) nvl, coalesce(1, 0) AS coalesce
%%sql
SELECT nvl(NULL, 0) nvl , coalesce(1, 0) AS coalesce
%%sql
SELECT coalesce(NULL, NULL, 2, NULL, 3) AS result
%%sql
SELECT nvl(NULL, NULL, 2, NULL, 3) AS result
%%sql
USE itversity_retail
%%sql
DROP TABLE IF EXISTS sales
%%sql
CREATE TABLE IF NOT EXISTS sales(
sales_person_id INT,
sales_amount FLOAT,
commission_pct INT
)
%%sql
INSERT INTO sales VALUES
(1, 1000, 10),
(2, 1500, 8),
(3, 500, NULL),
(4, 800, 5),
(5, 250, NULL)
%%sql
SELECT * FROM sales
%%sql
SELECT s.*,
nvl(commission_pct, 0) AS commission_pct
FROM sales AS s
%%sql
SELECT s.*,
coalesce(commission_pct, 0) AS commission_pct
FROM sales AS s
%%sql
SELECT s.*,
round(sales_amount * commission_pct / 100, 2) AS incorrect_commission_amount
FROM sales AS s
%%sql
SELECT s.*,
round(sales_amount * nvl(commission_pct, 0) / 100, 2) AS commission_amount
FROM sales AS s
%%sql
SELECT s.*,
round(sales_amount * coalesce(commission_pct, 0) / 100, 2) AS commission_amount
FROM sales AS s
%%sql
SELECT s.*,
nvl2(commission_pct, commission_pct + 1, 2) AS commission_pct
FROM sales AS s
%%sql
SELECT s.*,
round(sales_amount * nvl2(commission_pct, commission_pct + 1, 2) / 100, 2) AS commission_amount
FROM sales AS s
%%sql
SELECT s.*,
CASE WHEN commission_pct IS NULL
THEN 2
ELSE commission_pct + 1
END AS commission_pct
FROM sales AS s
%%sql
SELECT s.*,
CASE WHEN commission_pct IS NOT NULL
THEN commission_pct + 1
ELSE 2
END AS commission_pct
FROM sales AS s
%%sql
SELECT s.*,
CASE WHEN commission_pct IS NULL
THEN round((sales_amount * 2 / 100), 2)
ELSE round((sales_amount * (commission_pct + 1)/ 100), 2)
END AS commission_amount
FROM sales AS s