Managing Spark Metastore DatabasesΒΆ

Let us undestand how to manage Spark Metastore Databases.

  • Make a habit of reviewing Language Manual.

  • We can create database using CREATE DATABASE Command.

  • For e. g.: CREATE DATABASE itversity_demo;

  • If the database exists it will fail. If you want to ignore with out throwing error you can use IF NOT EXISTS

  • e. g.: CREATE DATABASE IF NOT EXISTS itversity_demo;

  • We can list the databases using SHOW databases;

  • Spark Metastore is multi tenant database. To switch to a database, you can use USE Command. e. g.: USE itversity_demo;

  • We can drop empty database by using DROP DATABASE itversity_demo;.

  • Add cascade to drop all the tables and then the database DROP DATABASE itversity_demo CASCADE;.

  • We can also specify location while creating the database - CREATE DATABASE itversity_demo LOCATION '/user/itversity/custom/itversity_demo.db'

import org.apache.spark.sql.SparkSession

val username = System.getProperty("user.name")
val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
    enableHiveSupport.
    master("yarn").
    appName(s"${username} | Spark SQL - Getting Started").
    getOrCreate
%%sql

DROP DATABASE IF EXISTS itversity_demo
%%sql

CREATE DATABASE itversity_demo
%%sql

CREATE DATABASE itversity_demo
%%sql

CREATE DATABASE IF NOT EXISTS itversity_demo
%%sql

SHOW databases
%%sql

SELECT current_database()
%%sql

USE itversity_demo
%%sql

SELECT current_database()
val username = System.getProperty("user.name")
import sys.process._
s"hdfs dfs -ls /user/${username}/warehouse/${username}_demo.db"!
%%sql

CREATE TABLE table_demo (i INT)
import sys.process._
s"hdfs dfs -ls /user/${username}/warehouse/${username}_demo.db"!
%%sql

DROP DATABASE itversity_demo CASCADE
import sys.process._
s"hdfs dfs -ls /user/${username}/warehouse/${username}_demo.db"!
%%sql

CREATE DATABASE itversity_demo LOCATION '/user/itversity/custom/itversity_demo.db'
import sys.process._
s"hdfs dfs -ls /user/${username}/custom"! // Directory will be created if it does not exists
import sys.process._
s"hdfs dfs -ls /user/${username}/custom/${username}_demo.db"!