Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Amazon Web Services

Introduction to Designing Data Lakes on AWS

Amazon Web Services via edX


Designing a data lake is challenging because of the scale and growth of data. Developers need to understand best practices to avoid common mistakes that could be hard to rectify. In this course we will cover the foundations of what a Data Lake is, how to ingest and organize data into the Data Lake, and dive into the data processing that can be done to optimize performance and costs when consuming the data at scale. This course is for professionals (Architects, System Administrators and DevOps) who need to design and build an architecture for secure and scalable Data Lake components. Students will learn about the use cases for a Data Lake and, contrast that with a traditional infrastructure of servers and storage.


Week 1: Hello World, I mean, Hello Data Lakes!

  • Video: Meet the Instructors
  • Video: Introduction to Week 1
  • Video: Why Data Lakes?
  • Video: Characteristics of a Data Lake
  • Video: Data Lake Components
  • Reading: Data Lake Characteristics and Components
  • Video: Comparison of a Data Lake to a Data Warehouse
  • Reading: Data Lakes and Data Warehouses
  • Video: Discussing sample Data Lake Architectures
  • Quiz/Assessment: Week 1 quiz

Week 2: AWS data related services

  • Video: Introduction to Week 2
  • Video: AWS Data Lake related services
  • Video: Amazon S3
  • Video: AWS Glue Data Catalog
  • Reading: S3 and Glue Data Catalog
  • Video: AWS Services used for data movement
  • Reading: Kinesis, API Gateway, etc
  • Video: AWS Services for Data processing
  • Video: AWS Services for Analytics
  • Video: AWS Services used for Predictive Analytics and Machine Learning
  • Reading: EMR, Glue Jobs, Lambda, Kinesis Analytics, Redshift
  • Video: Introduction to AWS LakeFormation
  • Reading: LakeFormation
  • Lab: Get familiar with AWS Services and create your first simple data lake

Week 3: Ingesting the rivers

  • Video: Introduction to Week 3
  • Video: Use the right tool for the job
  • Video: Understanding Data Structure and when to process data
  • Video: Data Streaming ingestion with Amazon Kinesis Services
  • Video: Diving Deep on Amazon Kinesis
  • Demo: Batch Data Ingestion with AWS Transfer Family
  • Reading: Batch Data Ingestion with AWS Services
  • Video: Data Cataloging
  • Demo: Using Glue Crawlers
  • Reading: The importance of data cataloging
  • Video: Reviewing the ingestion part of some Data Lake architectures
  • Lab: Ingesting Web Logs

Week 4: Processing and Analyzing data that sits in the Data Lake

  • Video: Introduction to Week 4
  • Video: Data prep and AWS Glue jobs
  • Video: File optimizations
  • Demo: Using S3, Glue and Athena to get insights about NYC Taxi data
  • Reading: Glue Jobs, Data Prep, Athena? Columnar Data Formats and Amazon Athena Optimizations
  • Video: Introduction to Data Lake security
  • Reading: Security and compliance
  • Video: The power of data visualization
  • Video: Introduction to Amazon QuickSight
  • Demo: Amazon Quicksight
  • Reading: Data visualization, Amazon QuickSight
  • Video: Registry of Open Data on AWS
  • Lab: Create an end-to-end Data Lake with AWS Services
  • Video: Course wrap-up!

Taught by

Rafael Lopes and Morgan Willis


Start your review of Introduction to Designing Data Lakes on AWS

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.