Define Problem StatementΒΆ
Let us define problemt statement to get an overview of basic transformations using Spark SQL.
- Get Daily Product Revenue using orders and order_items data set. 
- We have following fields in orders. - order_id 
- order_date 
- order_customer_id 
- order_status 
 
- We have following fields in order_items. - order_item_id 
- order_item_order_id 
- order_item_product_id 
- order_item_quantity 
- order_item_subtotal 
- order_item_product_price 
 
- We have one to many relationship between orders and order_items. 
- orders.order_id is primary key and order_items.order_item_order_id is foreign key to orders.order_id. 
- By the end of this module we will explore all standard transformation and get daily product revenue using following fields. - orders.order_date 
- order_items.order_item_product_id 
- order_items.order_item_subtotal (aggregated using date and product_id). 
 
- We will consider only COMPLETE or CLOSED orders.