
Improving Performance of AI Cluster by Tuning AI Ethernet Switch Features in SONiC
Open Compute Project via YouTube
Overview

Coursera Plus Annual Sale:
All Certificates & Courses 50% Off!
Grab it
This presentation by Nanda Ravindran, VP of Technical Sales at Edgecore Networks, explores how to optimize AI cluster performance through strategic tuning of Ethernet switch features in SONiC. Learn about the unique challenges posed by AI cluster fabrics, characterized by their low-entropy, high-density traffic patterns with frequent elephant flows. Discover techniques for minimizing network interference and latency while maximizing throughput over lossless Ethernet fabrics. The 19-minute talk demonstrates how various AI fabric features including RoCEv2, ECN, PFC, and DLB can be fine-tuned to achieve optimal performance in AI clusters while leveraging open solutions such as OCP hardware and SONiC.
Syllabus
Improving Performance of AI Cluster by Tuning AI Ethernet Switch Features in SONiC
Taught by
Open Compute Project