Apache Spark using SQL

This course is primarily designed to learn Spark SQL as part of Data Engineering.

About Spark SQL

Spark SQL is one of the popular SQL framework as part of Big Data landscape. It is an open source SQL engine based up on Spark’s distributed computing framework.

Here are some of the usages of Spark SQL.

  • Implement transformation rules as part of Data Engineering or Data Processing Pipelines.

  • Run Ad Hoc queries on top of data stored in storage systems such as HDFS, s3, Azure Blob etc.

  • Connect BI tools such as Tableau, Power BI etc and run reports.

You can sign up for our 10 node state of the art cluster/labs to learn Spark SQL using our unique integrated LMS. You will be able to learn the same way as demonstrated.

Course Details

This course is primarily designed to go through SQL capabilities of Spark SQL. As part of this course you will be learning the following topics.

  • Getting Started

  • Basic Transformations

  • Basic DDL and DML

  • DML (Contd) and Partitioning

  • Predefined Funtions

  • Windowing Functions

Desired Audience

Here are the desired audience for this course.

  • Experienced application developers to understand key aspects of Spark SQL.

  • Data Engineers and Data Warehouse Developers to understand key aspects of Spark SQL to build batch or streaming pipelines.

  • Testers to improve their scripting abilities to validate data in the files, tables etc.

Prerequisites

Here are the prerequisites before signing up for the course.

Logistics

  • Computer with decent configuration

    • At least 4 GB RAM

    • 8 GB RAM is highly desired

  • Chrome Browser

  • High Speed Internet

Desired Skills

  • Engineering or Science Degree

  • Ability to use computer

  • Knowledge or working experience with databases is highly desired