Building Makemore - Activations & Gradients, BatchNorm

Overview

The course delves into the internals of Multilayer Perceptrons (MLPs) with a focus on activations, gradients, and the importance of proper scaling. It covers diagnostic tools and visualizations for understanding deep network health, introduces Batch Normalization as a key innovation for training deep neural networks, and hints at upcoming topics like residual connections and the Adam optimizer. The course includes practical exercises to deepen understanding, and the teaching method involves a mix of theoretical explanations, code demonstrations, and visualizations. This course is intended for individuals interested in deep learning, neural networks, and improving the training process of deep neural networks.

Syllabus

intro
starter code
fixing the initial loss
fixing the saturated tanh
calculating the init scale: “Kaiming init”
batch normalization
batch normalization: summary
real example: resnet50 walkthrough
summary of the lecture
just kidding: part2: PyTorch-ifying the code
viz #1: forward pass activations statistics
viz #2: backward pass gradient statistics
the fully linear case of no non-linearities
viz #3: parameter activation and gradient statistics
viz #4: update:data ratio over time
bringing back batchnorm, looking at the visualizations
summary of the lecture for real this time

Taught by

Andrej Karpathy

Reviews

Start your review of Building Makemore - Activations & Gradients, BatchNorm

BloomTech’s Downfall: A Long Time Coming

Most common

Popular subjects

Popular courses

Building Makemore - Activations & Gradients, BatchNorm

Overview

Syllabus

Taught by

Reviews

BloomTech’s Downfall: A Long Time Coming

Taught by

Building Makemore - Becoming a Backprop Ninja

10 Best Deep Learning Courses

Never Stop Learning.