In this class, Introduction to Designing Data Lakes on AWS, we will help you understand how to create and operate a data lake in a secure and scalable way, without previous knowledge of data science! Starting with the "WHY" you may want a data lake, we will look at the Data-Lake value proposition, characteristics and components.
Designing a data lake is challenging because of the scale and growth of data. Developers need to understand best practices to avoid common mistakes that could be hard to rectify. In this course we will cover the foundations of what a Data Lake is, how to ingest and organize data into the Data Lake, and dive into the data processing that can be done to optimize performance and costs when consuming the data at scale. This course is for professionals (Architects, System Administrators and DevOps) who need to design and build an architecture for secure and scalable Data Lake components. Students will learn about the use cases for a Data Lake and, contrast that with a traditional infrastructure of servers and storage.
Welcome to the course! In Week 1, you'll discover why you may want a Data Lake, its characteristics and components, and how it compares to other data data scenarios, such as databases and data warehouses.
in Week 2, you'll build on your knowledge of what data lakes are and why they may be a solution for your needs. You'll explore AWS services that can be used in data lake architectures, like Amazon S3, AWS Glue, Amazon Athena, Amazon Elasticsearch Service, LakeFormation, Amazon Rekognition, API Gateway and other services used for data movement, processing and visualization.
In Week 3, you'll explore specifics of data cataloging and ingestion, and learn about services like AWS Transfer Family, Amazon Kinesis Data Streams, Kinesis Firehose, Kinesis Analytics, AWS Snow Family, AWS Glue Crawlers, and others. You'll also discover when is the right time to process data--before, after, or while data is being ingested. Given scenarios, you'll be able to easily identify when to process data and match the most appropriate AWS services to each scenario.
In Week 4, you are going to dive deeper into data optimization and data processing. Demos around best practices will show you how to optimize your dataset for performance and cost--just by using the right tool for the job! You will also discover data security, data visualization tools, and AWS datasets you can use to experiment and get started.