Introduction to Designing Data Lakes on AWS

Overview

Designing a data lake is challenging because of the scale and growth of data. Developers need to understand best practices to avoid common mistakes that could be hard to rectify. In this course we will cover the foundations of what a Data Lake is, how to ingest and organize data into the Data Lake, and dive into the data processing that can be done to optimize performance and costs when consuming the data at scale. This course is for professionals (Architects, System Administrators and DevOps) who need to design and build an architecture for secure and scalable Data Lake components. Students will learn about the use cases for a Data Lake and, contrast that with a traditional infrastructure of servers and storage.

Syllabus

Week 1: Hello World, I mean, Hello Data Lakes!

Video: Meet the Instructors
Video: Introduction to Week 1
Video: Why Data Lakes?
Video: Characteristics of a Data Lake
Video: Data Lake Components
Reading: Data Lake Characteristics and Components
Video: Comparison of a Data Lake to a Data Warehouse
Reading: Data Lakes and Data Warehouses
Video: Discussing sample Data Lake Architectures
Quiz/Assessment: Week 1 quiz

Week 2: AWS data related services

Video: Introduction to Week 2
Video: AWS Data Lake related services
Video: Amazon S3
Video: AWS Glue Data Catalog
Reading: S3 and Glue Data Catalog
Video: AWS Services used for data movement
Reading: Kinesis, API Gateway, etc
Video: AWS Services for Data processing
Video: AWS Services for Analytics
Video: AWS Services used for Predictive Analytics and Machine Learning
Reading: EMR, Glue Jobs, Lambda, Kinesis Analytics, Redshift
Video: Introduction to AWS LakeFormation
Reading: LakeFormation
Lab: Get familiar with AWS Services and create your first simple data lake

Week 3: Ingesting the rivers

Video: Introduction to Week 3
Video: Use the right tool for the job
Video: Understanding Data Structure and when to process data
Video: Data Streaming ingestion with Amazon Kinesis Services
Video: Diving Deep on Amazon Kinesis
Demo: Batch Data Ingestion with AWS Transfer Family
Reading: Batch Data Ingestion with AWS Services
Video: Data Cataloging
Demo: Using Glue Crawlers
Reading: The importance of data cataloging
Video: Reviewing the ingestion part of some Data Lake architectures
Lab: Ingesting Web Logs

Week 4: Processing and Analyzing data that sits in the Data Lake

Video: Introduction to Week 4
Video: Data prep and AWS Glue jobs
Video: File optimizations
Demo: Using S3, Glue and Athena to get insights about NYC Taxi data
Reading: Glue Jobs, Data Prep, Athena? Columnar Data Formats and Amazon Athena Optimizations
Video: Introduction to Data Lake security
Reading: Security and compliance
Video: The power of data visualization
Video: Introduction to Amazon QuickSight
Demo: Amazon Quicksight
Reading: Data visualization, Amazon QuickSight
Video: Registry of Open Data on AWS
Lab: Create an end-to-end Data Lake with AWS Services
Video: Course wrap-up!