Overview
Discover how to train and serve massive AI models efficiently in this 31-minute conference talk presented by Shashank Kapadia from Walmart at All Things Open AI 2025. Learn techniques for partitioning large models across GPUs and distributing data for optimal throughput, with detailed explanations of practical setup configurations and performance benchmarks. Explore the critical tradeoffs between latency and resource usage while gaining insights on how to customize parallelization strategies for various AI tasks beyond transformers, including computer vision applications. Gain a comprehensive understanding of designing and deploying parallelized workflows that effectively balance accuracy, speed, and infrastructure costs, enabling more efficient scaling of AI solutions in real-world production environments.
Syllabus
Scaling Large Models with Model & Data Parallelism: Techniques, Tradeoffs, and Best Practices
Taught by
All Things Open