Merging and MoErging for Compositional Generalization
Scalable Parallel Computing Lab, SPCL @ ETH Zurich via YouTube
Overview
Join this 56-minute talk by Colin Raffel at SPCL_Bcast #55, recorded on February 27, 2025, where he explores model merging techniques for compositional generalization. Learn about different perspectives on combining parameters of individual-task models to create multitask models. Discover the speaker's research findings that challenge whether merging actually provides meaningful compositional generalization, and explore the innovative concept of "MoErging" - an alternative approach that keeps individual-task models separate and learns to route between them instead of merging them. This presentation from ETH Zurich's Scalable Parallel Computing Lab offers valuable insights for researchers and practitioners interested in multitask training and model composition techniques. Visit the SPCL website for more talks in this broadcast series.
Syllabus
[SPCL_Bcast] Merging and MoErging for compositional generalization
Taught by
Scalable Parallel Computing Lab, SPCL @ ETH Zurich