Overview of Spark SQL Properties¶

Let us understand details about Spark SQL properties which control Spark SQL run time environment.

Spark SQL inherits properties defined for Spark. There are some Spark SQL related properties as well and these are applicable even for Data Frames.
We can review these properties using Management Tools such as Ambari or Cloudera Manager Web UI
Spark run time behavior is controlled by HDFS Properties files, YARN Properties files, Hive Properties files etc in those clusters where Spark is integrated with Hadoop and Hive.
We can get all the properties using SET; in Spark SQL CLI

Let us review some important properties in Spark SQL.

spark.sql.warehouse.dir
spark.sql.catalogImplementation

We can review the current value using SET spark.sql.warehouse.dir;

import org.apache.spark.sql.SparkSession

val username = System.getProperty("user.name")
val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
    enableHiveSupport.
    master("yarn").
    appName(s"${username} | Spark SQL - Getting Started").
    getOrCreate

%%sql

SET

%%sql

SET spark.sql.warehouse.dir

Waiting for a Spark session to start...

+--------------------+--------------------+
|                 key|               value|
+--------------------+--------------------+
|spark.sql.warehou...|/user/itversity/w...|
+--------------------+--------------------+

Properties with default values does not show up as part of SET command. But we can check and overwrite the values - for example

%%sql

SET spark.sql.shuffle.partitions

We can overwrite property by setting value using the same SET command, eg:

%%sql

SET spark.sql.shuffle.partitions=2

Apache Spark using SQL

Overview of Spark SQL Properties¶