Apache Spark Essential Training: Big Data Engineering

Overview

Learn how to make Apache Spark work with other Big Data technologies and put together an end-to-end project that can solve a real-world business problem.

Syllabus

Introduction

Driving big data engineering with Apache Spark
Course prerequisites
Setting up the exercise files

1. Data Engineering Concepts

What is data engineering?
Data engineering vs. data analytics vs. data science
Data engineering functions
Batch vs. real-time processing
Data engineering with Spark

2. Spark Capabilities for ETL

Spark architecture review
Parallel processing with Spark
Spark execution plan
Stateful stream processing
Spark analytics and ML

3. Batch Processing Pipelines

Batch processing use case: Problem statement
Batch processing use case: Design
Setting up the local DB
Uploading stock to a central store
Aggregating stock across warehouses

4. Real-Time Processing Pipelines

Real-time use case: Problem
Real-time use case: Design
Generating a visits data stream
Building a website analytics job
Executing the real-time pipeline

5. Data Engineering with Spark: Best Practices

Batch vs. real-time options
Scaling extraction and loading operations
Scaling processing operations
Building resiliency

6. End-to-End Exercise Project

Project exercise requirements
Solution design
Extracting long last actions
Building a scorecard

Conclusion

More about Apache Spark

Taught by

Kumaran Ponnambalam

Reviews

Start your review of Apache Spark Essential Training: Big Data Engineering

Udemy, Coursera, 2U/edX Face Lawsuits Over Meta Pixel Use

Most common

Popular subjects

Popular courses

Apache Spark Essential Training: Big Data Engineering

Overview

Syllabus

Taught by

Reviews

Udemy, Coursera, 2U/edX Face Lawsuits Over Meta Pixel Use

Taught by

Scalable Machine Learning on Big Data using Apache Spark

Apache Spark for Data Engineering and Machine Learning

Machine Learning with Apache Spark

Perform data engineering with Azure Synapse Apache Spark Pools

Data Streaming

Data Engineering with MS Azure Synapse Apache Spark Pools

1000 Hours of Free LinkedIn Learning Courses with Free Certification

Never Stop Learning.