Overview of Spark SQL CLIΒΆ

Let us understand how to launch Spark SQL CLI.

  • Logon to the gateway node of the cluster.

  • We have 2 versions of Spark in our labs. One can use spark-sql to launch Spark SQL using 1.6.x and spark2-sql to launch Spark SQL using 2.3.x.

  • Launch Spark SQL CLI using spark-sql. In clustered mode we might have to add additional arguments. For example

spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
  • One can get help using spark-sql --help

  • For e. g.: we can use spark-sql --database training_retail to connect to specific database. Here is the example in clustered mode.

spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse \
    --database ${USER}_retail
  • Spark SQL CLI will be launched and will be connected to ${USER}_retail database.

  • We can validate to which database we are connected to using SELECT current_database()