Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Visually Explaining Mixture of Experts LLMs like DeepSeek and Mixtral - How to Code

Neural Breakdown with AVB via YouTube

Overview

Coursera Plus Annual Sale: All Certificates & Courses 25% Off!
Dive into a comprehensive video tutorial that visually explains Mixture of Experts (MOE) Transformers, the architecture behind cutting-edge LLMs like DeepSeek V3 and Mixtral 8x22B. Learn essential concepts including Dense MOEs, Sparse MOEs, Top-K Routing, Noisy Routing, Expert Capacity, Switch Transformers, and Auxiliary load balancing losses. Follow along with visual explanations that clarify complex concepts, complemented by practical code snippets for implementation. The tutorial progresses from basic intuition about MOEs through Transformer fundamentals, then explores advanced routing mechanisms, collapse prevention techniques, and analysis of real-world implementations like Mixtral and DeepSeek. Perfect for those wanting to understand both the theory and practical implementation of state-of-the-art LLM architectures.

Syllabus

0:00 - Intro
1:52 - Mixture of Experts Intuition
4:53 - Transformers 101
9:20 - Dense MOEs
14:50 - Sparse MOEs
16:34 - Router Collapse and Top-K Routing
19:20 - Noisy TopK, Load Balancing
20:56 - Routing Analysis by Mixtral
22:30 - Auxilliary Losses & DeepSeek
24:05 - Expert Capacity
26:07 - 6 Points to Remember

Taught by

Neural Breakdown with AVB

Reviews

Start your review of Visually Explaining Mixture of Experts LLMs like DeepSeek and Mixtral - How to Code

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.