Adding Partitions to TablesΒΆ
Let us understand how we can add static partitions to Partitioned tables in Spark Metastore.
Let us start spark context for this Notebook so that we can execute the code provided. You can sign up for our 10 node state of the art cluster/labs to learn Spark SQL using our unique integrated LMS.
val username = System.getProperty("user.name")
import org.apache.spark.sql.SparkSession
val username = System.getProperty("user.name")
val spark = SparkSession.
builder.
config("spark.ui.port", "0").
config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
enableHiveSupport.
appName(s"${username} | Spark SQL - Managing Tables - DML and Partitioning").
master("yarn").
getOrCreate
If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.
Using Spark SQL
spark2-sql \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Using Scala
spark2-shell \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
Using Pyspark
pyspark2 \
--master yarn \
--conf spark.ui.port=0 \
--conf spark.sql.warehouse.dir=/user/${USER}/warehouse
We can add partitions using
ALTER TABLE
command withADD PARTITION
.For each and every partition created, a subdirectory will be created using partition column name and corresponding value under the table directory.
Let us understand how to add partitions to orders_part table under itversity_retail database.
Here is the script to add static partitions to a Partitioned table where partition column type is string.
%%sql
USE itversity_retail
%%sql
DROP TABLE IF EXISTS orders_part
%%sql
CREATE TABLE orders_part (
order_id INT,
order_date STRING,
order_customer_id INT,
order_status STRING
) PARTITIONED BY (order_month STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
import sys.process._
s"hdfs dfs -ls /user/${username}/warehouse/${username}_retail.db/orders_part" !
%%sql
ALTER TABLE orders_part ADD PARTITION (order_month='2013-07')
import sys.process._
s"hdfs dfs -ls /user/${username}/warehouse/${username}_retail.db/orders_part" !
Here is the script to add static partitions to a Partitioned table where partition column type is integer. We can add one or more partitions at a time. For further demos we will be using this table
%%sql
USE itversity_retail
%%sql
DROP TABLE IF EXISTS orders_part
%%sql
CREATE TABLE orders_part (
order_id INT,
order_date STRING,
order_customer_id INT,
order_status STRING
) PARTITIONED BY (order_month INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
%%sql
DESCRIBE orders_part
import sys.process._
s"hdfs dfs -ls /user/${username}/warehouse/${username}_retail.db/orders_part" !
%%sql
ALTER TABLE orders_part ADD PARTITION (order_month=201307)
%%sql
ALTER TABLE orders_part ADD
PARTITION (order_month=201308)
PARTITION (order_month=201309)
PARTITION (order_month=201310)
import sys.process._
s"hdfs dfs -ls /user/${username}/warehouse/${username}_retail.db/orders_part" !