Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Stanford University

Stanford Seminar - Audio Research: Transformers for Applications in Audio, Speech and Music

Stanford University via YouTube

Overview

This course aims to teach learners about the applications of Transformers in audio, speech, and music. The learning outcomes include understanding the use of Transformers for music and audio, synthesizing raw audio, improving Transformer models, and utilizing generative and contrastive learning of audio representations. The course covers skills such as working with spectograms, raw audio synthesis techniques, and combining Vector Quantization with auto-encoders and Transformers. The teaching method involves a seminar-style format with a focus on presenting research findings and methodologies. This course is intended for individuals interested in audio processing, machine learning, and artificial intelligence applications in the audio domain.

Syllabus

Introduction.
Transformers for Music and Audio: Language Modelling to Understanding to Synthesis.
The Transformer Revolution.
Models getting bigger ....
What are spectograms.
Raw Audio Synthesis: Difficulty Classical FM synthesis Karplus Strong.
Baseline : Classic WaveNet.
Improving Transformer Baseline • Major bottleneck of Transformers.
Results & Unconditioned Setup • Evaluation Criterion o Comparing Wavenet, Transformers on next sample prediction Top-5 accuracy, out of 256 possible states as a error metric Why this setup 7 1. Application agnostic 2. Suits training setup.
A Framework for Generative and Contrastive Learning of Audio Representations.
Acoustic Scene Understanding.
Recipe of doing.
Turbocharging best of two worlds Vector Quantization: A powerful and under-uilized algorithm Combining VQwih auto-encoders and Transformers.
Turbocharging best of two worlds Leaming clusters from vector quantization Use long term dependency kaming with that cluster based representation for markovian assumption Better we become in prediction, the better the summarization is.
Audio Transformers: Transformer Architectures for Large Scale Audio Understanding - Adieu Convolutions Stanford University March 2021.
Wavelets on Transformer Embeddings.
Methodology + Results.
What does it learn -- the front end.
Final Thoughts.

Taught by

Stanford Online

Reviews

Start your review of Stanford Seminar - Audio Research: Transformers for Applications in Audio, Speech and Music

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.