Exercises - Basic SQL Queries

Here are some of the exercises for which you can write SQL queries to self evaluate.

Let us start spark context for this Notebook so that we can execute the code provided. You can sign up for our 10 node state of the art cluster/labs to learn Spark SQL using our unique integrated LMS.

val username = System.getProperty("user.name")
import org.apache.spark.sql.SparkSession

val username = System.getProperty("user.name")
val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
    enableHiveSupport.
    appName(s"${username} | Spark SQL - Basic Transformations").
    master("yarn").
    getOrCreate

If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.

Using Spark SQL

spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

Using Scala

spark2-shell \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

Using Pyspark

pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

Exercise 1 - Customer order count

Get order count per customer for the month of 2014 January.

  • Tables - orders and customers

  • Data should be sorted in descending order by count and ascending order by customer id.

  • Output should contain customer_id, customer_first_name, customer_last_name and customer_order_count.

Exercise 2 - Dormant Customers

Get the customer details who have not placed any order for the month of 2014 January.

  • Tables - orders and customers

  • Data should be sorted in ascending order by customer_id

  • Output should contain all the fields from customers

Exercise 3 - Revenue Per Customer

Get the revenue generated by each customer for the month of 2014 January

  • Tables - orders, order_items and customers

  • Data should be sorted in descending order by revenue and then ascending order by customer_id

  • Output should contain customer_id, customer_first_name, customer_last_name, customer_revenue.

  • If there are no orders placed by customer, then the corresponding revenue for a give customer should be 0.

  • Consider only COMPLETE and CLOSED orders

Exercise 4 - Revenue Per Category

Get the revenue generated for each category for the month of 2014 January

  • Tables - orders, order_items, products and categories

  • Data should be sorted in ascending order by category_id.

  • Output should contain all the fields from category along with the revenue as category_revenue.

  • Consider only COMPLETE and CLOSED orders

Exercise 5 - Product Count Per Department

Get the products for each department.

  • Tables - departments, categories, products

  • Data should be sorted in ascending order by department_id

  • Output should contain all the fields from department and the product count as product_count