Managing Spark Metastore DatabasesΒΆ
Let us undestand how to manage Spark Metastore Databases.
Make a habit of reviewing Language Manual.
We can create database using CREATE DATABASE Command.
For e. g.:
CREATE DATABASE itversity_demo;
If the database exists it will fail. If you want to ignore with out throwing error you can use IF NOT EXISTS
e. g.:
CREATE DATABASE IF NOT EXISTS itversity_demo;
We can list the databases using
SHOW databases;
Spark Metastore is multi tenant database. To switch to a database, you can use USE Command. e. g.:
USE itversity_demo;
We can drop empty database by using
DROP DATABASE itversity_demo;
.Add cascade to drop all the tables and then the database
DROP DATABASE itversity_demo CASCADE;
.We can also specify location while creating the database -
CREATE DATABASE itversity_demo LOCATION '/user/itversity/custom/itversity_demo.db'
import org.apache.spark.sql.SparkSession
val username = System.getProperty("user.name")
val spark = SparkSession.
builder.
config("spark.ui.port", "0").
config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
enableHiveSupport.
master("yarn").
appName(s"${username} | Spark SQL - Getting Started").
getOrCreate
%%sql
DROP DATABASE IF EXISTS itversity_demo
%%sql
CREATE DATABASE itversity_demo
%%sql
CREATE DATABASE itversity_demo
%%sql
CREATE DATABASE IF NOT EXISTS itversity_demo
%%sql
SHOW databases
%%sql
SELECT current_database()
%%sql
USE itversity_demo
%%sql
SELECT current_database()
val username = System.getProperty("user.name")
import sys.process._
s"hdfs dfs -ls /user/${username}/warehouse/${username}_demo.db"!
%%sql
CREATE TABLE table_demo (i INT)
import sys.process._
s"hdfs dfs -ls /user/${username}/warehouse/${username}_demo.db"!
%%sql
DROP DATABASE itversity_demo CASCADE
import sys.process._
s"hdfs dfs -ls /user/${username}/warehouse/${username}_demo.db"!
%%sql
CREATE DATABASE itversity_demo LOCATION '/user/itversity/custom/itversity_demo.db'
import sys.process._
s"hdfs dfs -ls /user/${username}/custom"! // Directory will be created if it does not exists
import sys.process._
s"hdfs dfs -ls /user/${username}/custom/${username}_demo.db"!