Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

University of California, Berkeley

Distributed Machine Learning with Apache Spark

University of California, Berkeley via edX

This course may be unavailable.


Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘big data’, with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.

This statistics and data analysis course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Spark, a cluster computing system well-suited for large-scale machine learning tasks, and its packages and spark.mllib. You will implement distributed algorithms for fundamental statistical models (linear regression, logistic regression, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience.

Taught by

Ameet Talwalkar and Jon Bates


4.0 rating, based on 5 Class Central reviews

Start your review of Distributed Machine Learning with Apache Spark

  • Martijn Onderwater

    Martijn Onderwater completed this course, spending 6 hours a week on it and found the course difficulty to be medium.

    I really enjoyed taking this course! Initially, I was a bit annoyed with the various registrations that need to be filled out prior to the course (at edX, at Piazza, databricks, and one of the notebooks). But after this, everything was smooth sailin…
  • Alvaro Martin Orive

    Alvaro Martin Orive completed this course.

  • Stephane Mysona completed this course.

  • Lars Ahlfors completed this course.

  • Adam Hjerpe completed this course.

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.