Understanding Warehouse DirectoryΒΆ
Let us go through the details related to Spark Metastore Warehouse Directory.
A Database in Spark SQL is nothing but directory in underlying file system like HDFS.
A Spark Metastore Table is nothing but directory in underlying file systems like HDFS.
A Partition of Spark Metastore Table is nothing but directory in underlying file systems like HDFS under table.
Warehouse Directory is the base directory where directories related to databases, tables go by default.
It is controlled by
spark.sql.warehouse.dir
. You can get the value by sayingSET spark.sql.warehouse.dir;
Do not overwrite this property Spark SQL CLI. It will not have any effect.
Underlying directory for a database will have .db extension.
import org.apache.spark.sql.SparkSession
val username = System.getProperty("user.name")
val spark = SparkSession.
builder.
config("spark.ui.port", "0").
config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
enableHiveSupport.
master("yarn").
appName(s"${username} | Spark SQL - Getting Started").
getOrCreate
%%sql
SET spark.sql.warehouse.dir