Distributed Machine Learning with Apache Spark

Overview

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘big data’, with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.

This statistics and data analysis course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Spark, a cluster computing system well-suited for large-scale machine learning tasks, and its packages spark.ml and spark.mllib. You will implement distributed algorithms for fundamental statistical models (linear regression, logistic regression, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience.

Taught by

Ameet Talwalkar and Jon Bates

Reviews

4.0 rating, based on 5 Class Central reviews

Start your review of Distributed Machine Learning with Apache Spark

Martijn Onderwater

I really enjoyed taking this course! Initially, I was a bit annoyed with the various registrations that need to be filled out prior to the course (at edX, at Piazza, databricks, and one of the notebooks). But after this, everything was smooth sailin…

I really enjoyed taking this course! Initially, I was a bit annoyed with the various registrations that need to be filled out prior to the course (at edX, at Piazza, databricks, and one of the notebooks). But after this, everything was smooth sailing. The teachers explain the concepts well and they speak clearly. During the course, we touched various subjects that I am interested in: machine learning, spark, map reduce, python, and MLib. The lab exercices were at the right level for me (I have a solid background in math and software development) and took me about six hours each. I only got stuck at points where I had not read the instructions properly, and the active forum helped me through that. All in all I can recommend others to take this course!

/Martijn

--and of course: many thanks to the people at Berkeley for providing the class--
Alvaro Martin Orive
Stephane Mysona
Lars Ahlfors
Adam Hjerpe

Udemy, Coursera, 2U/edX Face Lawsuits Over Meta Pixel Use

Most common

Popular subjects

Popular courses

Distributed Machine Learning with Apache Spark

Overview

Taught by

Tags

Reviews

Udemy, Coursera, 2U/edX Face Lawsuits Over Meta Pixel Use

Taught by

Tags

Data Science and Engineering with Spark

Machine Learning Guide: Learn Machine Learning Algorithms

Machine Learning: Classification

Scalable Machine Learning on Big Data using Apache Spark

Analyzing Big Data in R using Apache Spark

Machine Learning and Business Intelligence Masterclass

10 Best Applied AI & ML Courses

50+ Free Online Courses and Webinars on Artificial Intelligence in Healthcare

10 Best Artificial Intelligence Courses

100 Top FREE edX Courses of All Time

Massive List of MOOC-based Microcredentials

Never Stop Learning.