Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

LinkedIn Learning

Spark for Machine Learning & AI

via LinkedIn Learning


Discover the powerful Apache Spark platform for machine learning. Learn about preprocessing data, applying algorithms to a variety of machine learning problems, and more.

Apache Spark is one of the most widely used and supported open-source tools for machine learning and big data. In this course, discover how to work with this powerful platform for machine learning. Instructor Dan Sullivan discusses MLlib—the Spark machine learning library—which provides tools for data scientists and analysts who would rather find solutions to business problems than code, test, and maintain their own machine learning libraries. He shows how to use DataFrames to organize data structure, and he covers data preparation and the most commonly used types of machine learning algorithms: clustering, classification, regression, and recommendations. By the end of the course, you will have experience loading data into Spark, preprocessing data as needed to apply MLlib algorithms, and applying those algorithms to a variety of machine learning problems.


  • Welcome
1. Introduction to Spark and MLlib
  • Introduction to Spark
  • Steps in the machine learning process
  • Install Spark
  • Organizing data in DataFrames
  • Components of Spark MLlib
2. Data Preparation and Transformation
  • Introduction to preprocessing
  • Normalize numeric data
  • Standardize numeric data
  • Bucketize numeric data
  • Tokenize text data
  • TF-IDF
  • Summary of preprocessing
3. Clustering
  • Introduction to clustering
  • K-means clustering
  • Hierarchical clustering
  • Summary of clustering techniques
4. Classification
  • Introduction to classification
  • Preprocessing the Iris data set
  • Naive Bayes classification
  • Multilayer perceptron classification
  • Decision trees classification
  • Summary of classification algorithms
5. Regression
  • Introduction to regresssion
  • Preprocessing regression data
  • Linear regression
  • Decision tree regression
  • Gradient-boosted tree regression
  • Summary of regression algorithms
6. Recommendations
  • Understand recommendation systems
  • Collaborative filtering
  • Tips for using Spark MLlib

Taught by

Dan Sullivan

Related Courses


Start your review of Spark for Machine Learning & AI

Never Stop Learning!

Get personalized course recommendations, track subjects and courses with reminders, and more.

Sign up for free