Apache Spark using SQL¶

This course is primarily designed to learn Spark SQL as part of Data Engineering.

About Spark SQL¶

Spark SQL is one of the popular SQL framework as part of Big Data landscape. It is an open source SQL engine based up on Spark’s distributed computing framework.

Here are some of the usages of Spark SQL.

Implement transformation rules as part of Data Engineering or Data Processing Pipelines.
Run Ad Hoc queries on top of data stored in storage systems such as HDFS, s3, Azure Blob etc.
Connect BI tools such as Tableau, Power BI etc and run reports.

You can sign up for our 10 node state of the art cluster/labs to learn Spark SQL using our unique integrated LMS. You will be able to learn the same way as demonstrated.

Course Details¶

This course is primarily designed to go through SQL capabilities of Spark SQL. As part of this course you will be learning the following topics.

Getting Started
Basic Transformations
Basic DDL and DML
DML (Contd) and Partitioning
Predefined Funtions
Windowing Functions

Desired Audience¶

Here are the desired audience for this course.

Experienced application developers to understand key aspects of Spark SQL.
Data Engineers and Data Warehouse Developers to understand key aspects of Spark SQL to build batch or streaming pipelines.
Testers to improve their scripting abilities to validate data in the files, tables etc.

Prerequisites¶

Here are the prerequisites before signing up for the course.

Logistics

Computer with decent configuration
- At least 4 GB RAM
- 8 GB RAM is highly desired
Chrome Browser
High Speed Internet

Desired Skills

Engineering or Science Degree
Ability to use computer
Knowledge or working experience with databases is highly desired