Adding Partitions to TablesΒΆ

Let us understand how we can add static partitions to Partitioned tables in Spark Metastore.

Let us start spark context for this Notebook so that we can execute the code provided. You can sign up for our 10 node state of the art cluster/labs to learn Spark SQL using our unique integrated LMS.

val username = System.getProperty("user.name")
import org.apache.spark.sql.SparkSession

val username = System.getProperty("user.name")
val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
    enableHiveSupport.
    appName(s"${username} | Spark SQL - Managing Tables - DML and Partitioning").
    master("yarn").
    getOrCreate

If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.

Using Spark SQL

spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

Using Scala

spark2-shell \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

Using Pyspark

pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
  • We can add partitions using ALTER TABLE command with ADD PARTITION.

  • For each and every partition created, a subdirectory will be created using partition column name and corresponding value under the table directory.

  • Let us understand how to add partitions to orders_part table under itversity_retail database.

Here is the script to add static partitions to a Partitioned table where partition column type is string.

%%sql

USE itversity_retail
%%sql

DROP TABLE IF EXISTS orders_part
%%sql

CREATE TABLE orders_part (
  order_id INT,
  order_date STRING,
  order_customer_id INT,
  order_status STRING
) PARTITIONED BY (order_month STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
import sys.process._

s"hdfs dfs -ls /user/${username}/warehouse/${username}_retail.db/orders_part" !
%%sql

ALTER TABLE orders_part ADD PARTITION (order_month='2013-07')
import sys.process._

s"hdfs dfs -ls /user/${username}/warehouse/${username}_retail.db/orders_part" !

Here is the script to add static partitions to a Partitioned table where partition column type is integer. We can add one or more partitions at a time. For further demos we will be using this table

%%sql

USE itversity_retail
%%sql

DROP TABLE IF EXISTS orders_part
%%sql

CREATE TABLE orders_part (
  order_id INT,
  order_date STRING,
  order_customer_id INT,
  order_status STRING
) PARTITIONED BY (order_month INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
%%sql

DESCRIBE orders_part
import sys.process._

s"hdfs dfs -ls /user/${username}/warehouse/${username}_retail.db/orders_part" !
%%sql

ALTER TABLE orders_part ADD PARTITION (order_month=201307)
%%sql

ALTER TABLE orders_part ADD
    PARTITION (order_month=201308)
    PARTITION (order_month=201309)
    PARTITION (order_month=201310)
import sys.process._

s"hdfs dfs -ls /user/${username}/warehouse/${username}_retail.db/orders_part" !