Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Audio Classification with Transformers - Speech Commands Recognition and Real-Time Speech-to-Text

Eran Feit via YouTube

Overview

Coursera Plus Monthly Sale: All Certificates & Courses 40% Off!
This tutorial demonstrates how to build a real-time speech command classification system using the Wav2Vec2 model and Hugging Face's Speech Commands dataset. Learn to explore and visualize audio data, extract features, preprocess audio inputs, train transformer models, track accuracy, and implement real-time audio classification. By following along, develop a custom speech recognition model capable of identifying spoken commands like "up," "down," "left," and "right." The 47-minute guide covers everything from installation and dataset exploration to model training and real-time inference, with complete code available through the provided link. Perfect for developers interested in audio processing, speech recognition, and practical applications of transformer models in audio classification tasks.

Syllabus

00:00 Introduction
03:56 Installation
07:22 Discover the dataset
10:00 Load the dataset
17:33 Build and train the model
31:29 Test the model Prediction
41:44 Bonus Real time audio classification

Taught by

Eran Feit

Reviews

Start your review of Audio Classification with Transformers - Speech Commands Recognition and Real-Time Speech-to-Text

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.