Define Problem StatementΒΆ

Let us define problemt statement to get an overview of basic transformations using Spark SQL.

  • Get Daily Product Revenue using orders and order_items data set.

  • We have following fields in orders.

    • order_id

    • order_date

    • order_customer_id

    • order_status

  • We have following fields in order_items.

    • order_item_id

    • order_item_order_id

    • order_item_product_id

    • order_item_quantity

    • order_item_subtotal

    • order_item_product_price

  • We have one to many relationship between orders and order_items.

  • orders.order_id is primary key and order_items.order_item_order_id is foreign key to orders.order_id.

  • By the end of this module we will explore all standard transformation and get daily product revenue using following fields.

    • orders.order_date

    • order_items.order_item_product_id

    • order_items.order_item_subtotal (aggregated using date and product_id).

  • We will consider only COMPLETE or CLOSED orders.