Define Problem StatementΒΆ
Let us define problemt statement to get an overview of basic transformations using Spark SQL.
Get Daily Product Revenue using orders and order_items data set.
We have following fields in orders.
order_id
order_date
order_customer_id
order_status
We have following fields in order_items.
order_item_id
order_item_order_id
order_item_product_id
order_item_quantity
order_item_subtotal
order_item_product_price
We have one to many relationship between orders and order_items.
orders.order_id is primary key and order_items.order_item_order_id is foreign key to orders.order_id.
By the end of this module we will explore all standard transformation and get daily product revenue using following fields.
orders.order_date
order_items.order_item_product_id
order_items.order_item_subtotal (aggregated using date and product_id).
We will consider only COMPLETE or CLOSED orders.