Overview
Learn how to analyze data using PySpark and Resilient Distributed Datasets (RDD) in this course. Understand the fundamentals of RDDs, Lazy Evaluation, and how to apply Transformations and Actions using PySpark. The course demonstrates everything through a Databricks notebook, covering topics such as lambda functions, line lengths, parallelize, transform functions, flatMap, distinct, filter, and provides a recap. The course is designed for individuals interested in mastering Databricks and Apache Spark, specifically focusing on PySpark using RDDs.
Syllabus
Introduction
RDDs
Lazy Evaluation
Transformations
Actions
Demo
lambda function
linelengths
parallelize
transfunc
flatmap
Distinct
Filter
Recap
Taught by
Bryan Cafferky