Exercise - Getting Started with Spark SQL

Let’s do a simple exercise to conclude this section.

  • Launch Spark SQL (don’t use database) (use spark-sql command). Here is the script for our labs. In other environments, you can skip last line. I have also included commands to launch Spark using Scala or Python (for CCA 175 Certification purpose)

Using Spark SQL

spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

Using Scala

spark2-shell \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

Using Pyspark

pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
  • Create Database and exit (make sure to prefix database name with your OS username, e. g.: training_retail for OS user training)

  • Exit and launch connecting to your database

  • Create Table orders. You can use below script to create the table.

CREATE TABLE orders (
  order_id INT,
  order_date STRING,
  order_customer_id INT,
  order_status STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
  • List the tables

  • Describe the table and review the whole metadata