Overview
This tutorial demonstrates how to build a real-time speech command classification system using the Wav2Vec2 model and Hugging Face's Speech Commands dataset. Learn to explore and visualize audio data, extract features, preprocess audio inputs, train transformer models, track accuracy, and implement real-time audio classification. By following along, develop a custom speech recognition model capable of identifying spoken commands like "up," "down," "left," and "right." The 47-minute guide covers everything from installation and dataset exploration to model training and real-time inference, with complete code available through the provided link. Perfect for developers interested in audio processing, speech recognition, and practical applications of transformer models in audio classification tasks.
Syllabus
00:00 Introduction
03:56 Installation
07:22 Discover the dataset
10:00 Load the dataset
17:33 Build and train the model
31:29 Test the model Prediction
41:44 Bonus Real time audio classification
Taught by
Eran Feit