Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Pluralsight

Scalable Data Processing with Python

via Pluralsight

Overview

FLASH SALE: Ends May 22!
Udemy online courses up to 85% off.
Get Deal


Scalable data processing is essential for handling large datasets efficiently, yet many struggle with optimizing performance. In this course, Scalable Data Processing with Python, you’ll gain the ability to process and manage large-scale data using PySpark and Dask. First, you’ll explore the fundamentals of scalability, including parallel, distributed, and batch processing. Next, you’ll discover how to use PySpark to process massive datasets with transformations, caching, and optimizations. Finally, you’ll learn how to leverage Dask for parallel computation, optimizing execution with task graphs and lazy evaluation. When you’re finished with this course, you’ll have the skills and knowledge to efficiently process large datasets and handle performance challenges in scalable data processing.

Taught by

Yasir Khan

Reviews

Start your review of Scalable Data Processing with Python

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.