Overview
This course aims to teach learners about key data reliability challenges and how Delta Lake brings reliability to data lakes at scale. Participants will understand how Delta Lake fits within an Apache Sparkâ„¢ environment and how to use it to realize data reliability improvements. The teaching method includes a combination of instructor-led sessions and hands-on interactive activities. The course is intended for data engineers and practitioners looking to enhance data reliability and performance in their organizations.
Syllabus
Introduction
Data Lakes
Typical Data Lake Project
Who uses Delta
Getting started
Data
Download Data
Park Table
Stop Streaming
Initializing Streaming
Working with Parker
Using Delta Lake
Streaming Job
Multiple Streaming Queries
Counting Continuously
Schema Evolution
Merged Schema
Summary
History
Vacuum
Mods
Merge
Update Data
Define DataFrame
Merge Syntax
Random Data
For Each Batch
Summarize
Community
Question
Thank you
Taught by
Databricks