Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.


Machine Learning with Apache Spark

IBM via Coursera


Explore the exciting world of machine learning with this IBM course. Start by learning ML fundamentals before unlocking the power of Apache Spark to build and deploy ML models for data engineering applications. Dive into supervised and unsupervised learning techniques and discover the revolutionary possibilities of Generative AI through instructional readings and videos. Gain hands-on experience with Spark structured streaming, develop an understanding of data engineering and ML pipelines, and become proficient in evaluating ML models using SparkML. In practical labs, you'll utilize SparkML for regression, classification, and clustering, enabling you to construct prediction and classification models. Connect to Spark clusters, analyze SparkSQL datasets, perform ETL activities, and create ML models using Spark ML and sci-kit learn. Finally, demonstrate your acquired skills through a final assignment. This intermediate course is suitable for aspiring and experienced data engineers, as well as working professionals in data analysis and machine learning. Prior knowledge in Big Data, Hadoop, Spark, Python, and ETL is highly recommended for this course.


  • Get Started with Machine Learning
    • In this module, you will gain knowledge of machine learning techniques that enable computers to perform tasks without explicit programming. You will explore the lifecycle of machine learning models and understand the crucial role of data engineering in machine learning projects. The module covers supervised and unsupervised learning techniques, including classification, regression, and clustering. Furthermore, you will acquire valuable insights into Generative AI and its potential to revolutionize multiple industries, enhance people's lives, and generate newer and previously unimaginable data and experiences.
  • Machine Learning with Apache Spark
    • This module will introduce you to Spark and provide an overview of its key features and applications in the field of data engineering. You will discover the process of connecting to a Spark cluster using SN labs and delve into various topics such as regression, mileage prediction, classification, diabetic classification, clustering, and clustering load data using SparkML. Additionally, you will gain insights into how to construct these models using Spark ML. Moreover, this module will cover GraphFrames on Apache Spark and guide you in hands-on labs.
  • Data Engineering for Machine Learning using Apache Spark
    • This module begins with Apache Spark Structured Streaming and its role in processing streaming data with Spark SQL. You will acquire knowledge about key terms associated with Structured Streaming. The module then covers the Extract-Transform-Load process and provides hands-on experience in transferring data from one source to another destination with varying data formats or structures. Additionally, you will gain a practical understanding of feature extraction and transformation using Spark extract and transform features. The module also delves into machine learning pipelines using Spark, demonstrating the process and benefits involved. Lastly, you will grasp the concept of model persistence and its significant role in Machine Learning.
  • Final Project
    • In this module, you will apply the data engineering skills and techniques you have acquired throughout the course. The course concludes with a final project and assignments that allow you to demonstrate your proficiency in these areas. You will step into the role of a data engineer working at a renowned aeronautics consulting company recognized for its adeptness in handling large datasets. Your role as a data engineer is crucial as the data scientists rely on your expertise to carry out ETL (Extract, Transform, Load) tasks and establish machine learning pipelines. While data scientists possess expertise in machine learning, they depend on your specialized knowledge to handle various algorithms and data formats. Your contribution plays a vital role in ensuring the smooth execution of their tasks.

Taught by

Skills Network and Ramesh Sannareddy


4.5 rating at Coursera based on 57 ratings

Start your review of Machine Learning with Apache Spark

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.