Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.


Data Science and Engineering with Spark

Berkeley University of California via edX XSeries

This course may be unavailable.


The Data Science and Engineering with Spark XSeries, created in partnership with Databricks, will teach students how to perform data science and data engineering at scale using Spark, a cluster computing system well-suited for large-scale machine learning tasks. It will also present an integrated view of data processing by highlighting the various components of data analysis pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. Students will gain hands-on experience building and debugging Spark applications. Internal details of Spark and distributed machine learning algorithms will be covered, which will provide students with intuition about working with big data and developing code for a distributed environment.

This XSeries requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (the Python API for Spark), but previous experience with Spark or distributed computing is NOT required. Familiarity with basic machine learning concepts and exposure to algorithms, probability, linear algebra and calculus are prerequisites for two of the courses in this series.


Courses under this program:
Course 1: Big Data Analysis with Apache Spark
Learn how to apply data science techniques using parallel programming in Apache Spark to explore big data.

Course 2: Distributed Machine Learning with Apache Spark
Learn the underlying principles required to develop scalable machine learning pipelines and gain hands-on experience using Apache Spark.

Course 3: Introduction to Apache Spark
Learn the fundamentals and architecture of Apache Spark, the leading cluster-computing framework among professionals.


Taught by

Jon Bates, Ameet Talwalkar and Anthony D. Joseph


Start your review of Data Science and Engineering with Spark

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.