Get started with custom lists to organize and share courses.

Sign up

Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Distributed Machine Learning with Apache Spark

University of California, Berkeley via edX

5 Reviews 96 students interested

Taken this course? Share your experience with other students. Write review

Overview

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘big data’, with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.

This statistics and data analysis course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Spark, a cluster computing system well-suited for large-scale machine learning tasks, and its packages spark.ml and spark.mllib. You will implement distributed algorithms for fundamental statistical models (linear regression, logistic regression, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience.

Taught by

Ameet Talwalkar and Jon Bates

Tags

Help Center

Most commonly asked questions about EdX EdX

Reviews for edX's Distributed Machine Learning with Apache Spark
4.0 Based on 5 reviews

  • 5 star 20%
  • 4 stars 60%
  • 3 star 20%
  • 2 star 0%
  • 1 star 0%

Did you take this course? Share your experience with other students.

Write a review
  • 1
Martijn O
4.0 3 years ago
by Martijn completed this course, spending 6 hours a week on it and found the course difficulty to be medium.
I really enjoyed taking this course! Initially, I was a bit annoyed with the various registrations that need to be filled out prior to the course (at edX, at Piazza, databricks, and one of the notebooks). But after this, everything was smooth sailing. The teachers explain the concepts well and they speak clearly. During the course, we touched various subjects that I am interested in: machine learning, spark, map reduce, python, and MLib. The lab exercices were at the right level for me (I have a solid background in math and software development) and took me about six hours each. I only got stuck at points where I had not read the instructions properly, and the active forum helped me through that. All in all I can recommend others to take this course!

/Martijn

--and of course: many thanks to the people at Berkeley for providing the class--
1 person found
this review helpful
Was this review helpful to you? Yes
Alvaro O
5.0 3 years ago
by Alvaro completed this course.
0 person found
this review helpful
Was this review helpful to you? Yes
Stephane M
3.0 3 years ago
by Stephane completed this course.
0 person found
this review helpful
Was this review helpful to you? Yes
Lars A
4.0 3 years ago
by Lars completed this course.
Was this review helpful to you? Yes
Adam H
4.0 2 years ago
by Adam completed this course.
Was this review helpful to you? Yes
  • 1

Class Central

Get personalized course recommendations, track subjects and courses with reminders, and more.

Sign up for free

Never stop learning Never Stop Learning!

Get personalized course recommendations, track subjects and courses with reminders, and more.