Fine-tune Text to Speech Models: CSM-1B and Orpheus TTS

Fine-tune Text to Speech Models: CSM-1B and Orpheus TTS

Trelis Research via YouTube Direct link

00:00 Introduction to End-to-End Audio + Text Models like GPT-4o and Llama 4 ?

1 of 20

1 of 20

00:00 Introduction to End-to-End Audio + Text Models like GPT-4o and Llama 4 ?

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Fine-tune Text to Speech Models: CSM-1B and Orpheus TTS

Automatically move to the next video in the Classroom when playback concludes

  1. 1 00:00 Introduction to End-to-End Audio + Text Models like GPT-4o and Llama 4 ?
  2. 2 01:04 End-to-End Multimodal Models and Their Capabilities
  3. 3 02:36 Traditional Approaches to Text-to-Speech
  4. 4 03:06 Token-Based Approaches and Their Advantages
  5. 5 03:25 Detailed Look at Orpheus and CSM-1B Models
  6. 6 06:58 Training and Inference with Token-Based Models
  7. 7 12:53 Hierarchical Tokenization for High-Quality Audio
  8. 8 14:11 Kyutai’s Moshi Model for Text + Speech
  9. 9 23:41 Sesame’s CSM-1B Model Architecture
  10. 10 25:13 Orpheus TTS architecture by Canopy Labs
  11. 11 27:34 Inferencing and Cloning with CSM-1B
  12. 12 40:13 Context Aware Text to Speech with CSM-1B
  13. 13 48:21 Orpheus Inference and Cloning - FREE Colab
  14. 14 55:09 Orpheus Voice Cloning Setup
  15. 15 01:01:20 Orpheus Fine-tuning Full fine-tuning and LoRA fine-tuning
  16. 16 01:09:55 Running Full Fine Tuning
  17. 17 01:19:33 Running LoRa Fine Tuning
  18. 18 01:25:20 Inference and Comparison
  19. 19 01:29:27 Inference with Cloning AND fine-tuning
  20. 20 01:35:48 The future of token-based multi-modal models

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.