Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

LinkedIn Learning

Apache Spark Essential Training: Big Data Engineering

via LinkedIn Learning

Overview

Learn how to make Apache Spark work with other Big Data technologies and put together an end-to-end project that can solve a real-world business problem.

Syllabus

Introduction
  • Driving big data engineering with Apache Spark
  • Course prerequisites
  • Setting up the exercise files
1. Data Engineering Concepts
  • What is data engineering?
  • Data engineering vs. data analytics vs. data science
  • Data engineering functions
  • Batch vs. real-time processing
  • Data engineering with Spark
2. Spark Capabilities for ETL
  • Spark architecture review
  • Parallel processing with Spark
  • Spark execution plan
  • Stateful stream processing
  • Spark analytics and ML
3. Batch Processing Pipelines
  • Batch processing use case: Problem statement
  • Batch processing use case: Design
  • Setting up the local DB
  • Uploading stock to a central store
  • Aggregating stock across warehouses
4. Real-Time Processing Pipelines
  • Real-time use case: Problem
  • Real-time use case: Design
  • Generating a visits data stream
  • Building a website analytics job
  • Executing the real-time pipeline
5. Data Engineering with Spark: Best Practices
  • Batch vs. real-time options
  • Scaling extraction and loading operations
  • Scaling processing operations
  • Building resiliency
6. End-to-End Exercise Project
  • Project exercise requirements
  • Solution design
  • Extracting long last actions
  • Building a scorecard
Conclusion
  • More about Apache Spark

Taught by

Kumaran Ponnambalam

Reviews

Start your review of Apache Spark Essential Training: Big Data Engineering

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.