Stanford Seminar - Audio Research: Transformers for Applications in Audio, Speech and Music

Overview

This course aims to teach learners about the applications of Transformers in audio, speech, and music. The learning outcomes include understanding the use of Transformers for music and audio, synthesizing raw audio, improving Transformer models, and utilizing generative and contrastive learning of audio representations. The course covers skills such as working with spectograms, raw audio synthesis techniques, and combining Vector Quantization with auto-encoders and Transformers. The teaching method involves a seminar-style format with a focus on presenting research findings and methodologies. This course is intended for individuals interested in audio processing, machine learning, and artificial intelligence applications in the audio domain.

Syllabus

Introduction.
Transformers for Music and Audio: Language Modelling to Understanding to Synthesis.
The Transformer Revolution.
Models getting bigger ....
What are spectograms.
Raw Audio Synthesis: Difficulty Classical FM synthesis Karplus Strong.
Baseline : Classic WaveNet.
Improving Transformer Baseline • Major bottleneck of Transformers.
Results & Unconditioned Setup • Evaluation Criterion o Comparing Wavenet, Transformers on next sample prediction Top-5 accuracy, out of 256 possible states as a error metric Why this setup 7 1. Application agnostic 2. Suits training setup.
A Framework for Generative and Contrastive Learning of Audio Representations.
Acoustic Scene Understanding.
Recipe of doing.
Turbocharging best of two worlds Vector Quantization: A powerful and under-uilized algorithm Combining VQwih auto-encoders and Transformers.
Turbocharging best of two worlds Leaming clusters from vector quantization Use long term dependency kaming with that cluster based representation for markovian assumption Better we become in prediction, the better the summarization is.
Audio Transformers: Transformer Architectures for Large Scale Audio Understanding - Adieu Convolutions Stanford University March 2021.
Wavelets on Transformer Embeddings.
Methodology + Results.
What does it learn -- the front end.
Final Thoughts.