Get up and running with Apache Spark quickly. This practical hands-on course shows Python users how to work with Apache PySpark to leverage the power of Spark for data science.
Overview
Syllabus
Introduction
- Apache PySpark
- What you should know
- The Apache Spark ecosystem
- Why Spark?
- Spark origins and Databricks
- Spark components
- Partitions, transformations, lazy evaluations, and actions
- Set up the lab environment
- Download a dataset
- Importing
- The DataFrame API
- Working with DataFrames
- Schemas
- Working with columns
- Working with rows
- Challenge
- Solution
- Built-in functions
- Working with dates
- User-defined functions
- Working with joins
- Challenge
- Solution
- RDDs
- Working with RDDs
- Next steps
Taught by
Jonathan Fernandes