Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Udemy

Data Engineering, Serverless ETL & BI on Amazon Cloud

via Udemy

Overview

Data warehousing & ETL on AWS Cloud

What you'll learn:
  • Setting up a Data Warehouse on Amazon Cloud using Redshift from scratch
  • Learn and understand AWS Athena and when to make use of Athena
  • Learn how to store data in S3 Data lakes using Parquet columnar file formats and optimize the process of data scans using Athena
  • Learn and automate the ETL processes using different server-less components like AWS Glue , Data Pipeline and Lambda Functions
  • Data Centralization using Redshift Spectrum
  • Trigger and Automate Glue jobs using Lambda Functions
  • Understand how to pull data into QuickSight which is a BI-Reporting/Visualization offering from AWS

AWS Cloud can seem intimidating and overwhelming to a lot of people due to its vast ecosystem, but this course will make it easier for anyone who wants a hands-on expertise in setting up a data-warehouse in Redshift or setup a BI infrastructure from scratch .

Data Scientists/Analysts/Business Analysts will soon be expected to (if not already) become all-rounders and handle the technical aspect of data ingestion/engineering/warehousing .

Anyone who has the basic understanding of how cloud works can benefit from this course because :

- This course is designed keeping in mind end to end life cycle of a typical data engineering project

- Provides a practical solution to real-world use-cases

This Course covers :

  • Setting up a data warehouse in AWSRedshift from scratch

  • Basic Data Warehousing Concepts

  • Writing server-less AWSGlue Jobs (pyspark and python shell) for ETLand batch processing

  • AWSAthena for ad-hoc analysis (when to use Athena)

  • AWSData Pipeline to sync incremental data

  • Lambda functions to trigger and automate ETL/Data Syncing processes

  • QuickSight Setup , Analyses and Dashboards

Prerequisites for this course are :

  • Python / Sql (Absolute must)

  • PySpark (should know how to write some basic Pyspark scripts)

  • Willingness to explore ,learn and put in the extra effort to succeed

  • An active AWSAccount

Important Note - This course makes use of the free tiers for Redshift and RDS , so you will not be billed for them unless you exceed the free tier usage which should be more than enough to get enough practice from this course .

Also , this course makes use of AWS UIon the browser for creating clusters and setting up jobs , there is no bash scripting involved. One can use any operating system to perform the lab sessions in this course .

This course is not code-intense or code-heavy ,there is only 35% coding involved , the rest is execution,understanding and chaining different component together. The whole purpose of this course is to make everyone aware of and feel comfortable with all the tools/features used in this course .

Some Tips :

  • Try to watch the videos at 1.2X speed

  • Every time you work on a new component or feature , do some research on the other tools that are meant for the same purpose and see how they differ and in what aspects , For Eg Redshift/Athena vs Snowflake or Bigquery , QuickSight vs PowerBi vs Microstrategy


Taught by

Siddharth Raghunath

Reviews

4.2 rating at Udemy based on 664 ratings

Start your review of Data Engineering, Serverless ETL & BI on Amazon Cloud

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.