Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Sesame AI and RVQs - The Network Architecture Behind Viral Speech Models

Neural Breakdown with AVB via YouTube

Overview

FLASH SALE: Ends May 22!
Udemy online courses up to 85% off.
Get Deal
Explore the groundbreaking Sesame Conversational Speech Model in this 19-minute technical video from Neural Breakdown with AVB. Dive into the architecture of this powerful speech-to-speech AI that enables expressive talking, intelligent responses, and natural interactions. Learn about the Mimi Encoder's audio tokenization using split RVQ (Residual Vector Quantization), understand the critical role of semantic and acoustic codes in audio comprehension, and follow a detailed step-by-step breakdown of the Autoregressive Transformer Backbone and Audio Decoder. The video references key research papers including Moshi, SoundStream, HuBert, and Speech Tokenizer, providing a comprehensive technical overview of the network architecture behind viral speech models. Additional resources include access to supplementary materials through Patreon, related videos on transformers, and guides to fine-tuning open source LLMs.

Syllabus

Sesame AI and RVQs - the network architecture behind VIRAL speech models

Taught by

Neural Breakdown with AVB

Reviews

Start your review of Sesame AI and RVQs - The Network Architecture Behind Viral Speech Models

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.