Overview
In this 52-minute keynote address from USENIX FAST '25, Dr. Seetharami Seelam from IBM Research shares valuable insights from developing and operating two generations of cloud-native AI supercomputers. Explore how IBM's Vela systems serve as the foundation for the company's AI initiatives, addressing critical challenges in scaling, performance, and high availability. Learn about innovative solutions implemented across compute, network, and storage components that enable efficient execution of diverse AI workloads. Gain practical knowledge from IBM's two-year experience managing these systems on a cloud-native platform, and discover Dr. Seelam's perspective on future directions for hardware-middleware integration in next-generation AI system design. The presentation highlights how AI supercomputers in public clouds facilitate rapid, cost-effective development and deployment of advanced AI models, particularly as demand grows for generative AI and foundational models.
Syllabus
FAST '25 - Keynote Address: Insights Gained from Delivering Two Generations of AI Supercomputers...
Taught by
USENIX