Exercises - Basic SQL Queries¶
Here are some of the exercises for which you can write SQL queries to self evaluate.
Let us start spark context for this Notebook so that we can execute the code provided. You can sign up for our 10 node state of the art cluster/labs to learn Spark SQL using our unique integrated LMS.
val username = System.getProperty("user.name")
import org.apache.spark.sql.SparkSession
val username = System.getProperty("user.name")
val spark = SparkSession.
builder.
config("spark.ui.port", "0").
config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
enableHiveSupport.
appName(s"${username} | Spark SQL - Basic Transformations").
master("yarn").
getOrCreate
If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.
Using Spark SQL
spark2-sql \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Using Scala
spark2-shell \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Using Pyspark
pyspark2 \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Exercise 1 - Customer order count¶
Get order count per customer for the month of 2014 January.
Tables - orders and customers
Data should be sorted in descending order by count and ascending order by customer id.
Output should contain customer_id, customer_first_name, customer_last_name and customer_order_count.
Exercise 2 - Dormant Customers¶
Get the customer details who have not placed any order for the month of 2014 January.
Tables - orders and customers
Data should be sorted in ascending order by customer_id
Output should contain all the fields from customers
Exercise 3 - Revenue Per Customer¶
Get the revenue generated by each customer for the month of 2014 January
Tables - orders, order_items and customers
Data should be sorted in descending order by revenue and then ascending order by customer_id
Output should contain customer_id, customer_first_name, customer_last_name, customer_revenue.
If there are no orders placed by customer, then the corresponding revenue for a give customer should be 0.
Consider only COMPLETE and CLOSED orders
Exercise 4 - Revenue Per Category¶
Get the revenue generated for each category for the month of 2014 January
Tables - orders, order_items, products and categories
Data should be sorted in ascending order by category_id.
Output should contain all the fields from category along with the revenue as category_revenue.
Consider only COMPLETE and CLOSED orders
Exercise 5 - Product Count Per Department¶
Get the products for each department.
Tables - departments, categories, products
Data should be sorted in ascending order by department_id
Output should contain all the fields from department and the product count as product_count