Overview of Spark SQL PropertiesΒΆ
Let us understand details about Spark SQL properties which control Spark SQL run time environment.
Spark SQL inherits properties defined for Spark. There are some Spark SQL related properties as well and these are applicable even for Data Frames.
We can review these properties using Management Tools such as Ambari or Cloudera Manager Web UI
Spark run time behavior is controlled by HDFS Properties files, YARN Properties files, Hive Properties files etc in those clusters where Spark is integrated with Hadoop and Hive.
We can get all the properties using
SET;
in Spark SQL CLI
Let us review some important properties in Spark SQL.
spark.sql.warehouse.dir
spark.sql.catalogImplementation
We can review the current value using
SET spark.sql.warehouse.dir;
import org.apache.spark.sql.SparkSession
val username = System.getProperty("user.name")
val spark = SparkSession.
builder.
config("spark.ui.port", "0").
config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
enableHiveSupport.
master("yarn").
appName(s"${username} | Spark SQL - Getting Started").
getOrCreate
%%sql
SET
%%sql
SET spark.sql.warehouse.dir
Waiting for a Spark session to start...
+--------------------+--------------------+
| key| value|
+--------------------+--------------------+
|spark.sql.warehou...|/user/itversity/w...|
+--------------------+--------------------+
Properties with default values does not show up as part of
SET
command. But we can check and overwrite the values - for example
%%sql
SET spark.sql.shuffle.partitions
We can overwrite property by setting value using the same SET command, eg:
%%sql
SET spark.sql.shuffle.partitions=2