Overview
Discover how Crusoe validates network infrastructure for AI workloads in this 39-minute Tech Field Day presentation. Learn about their proactive testing approach for frontend networks and inter-VM/host data transfers that feed GPU clusters, allowing them to identify and resolve issues before customers encounter them. Explore Crusoe's vertically aligned AI infrastructure powered by sustainable energy sources including wind, solar, and geothermal, with a major project in Abilene, Texas. Understand their AI cloud platform offering infrastructure as a service through virtualized machines, along with managed AI solutions. Follow their mission to build an enterprise-scale, purpose-built AI cloud focusing on data center networks, software-defined networking, and GPU-to-GPU fabrics optimized with NVIDIA reference architectures. See how Crusoe partners with Keysight for rigorous testing to ensure optimal performance and stability, particularly for stateful traffic and high connection rates, simulating various workloads to identify breaking points and prevent noisy neighbor issues in multi-tenant environments. Gain insights into their use of Cyperf as a traffic generator to understand open-source OVS and NVIDIA's stack behavior, and learn about future plans including Blackwell platforms, telemetry advancements, and storage optimization. Presented by Gavin McKee, Cloud Network Infrastructure Architect AI/ML/HPC at Crusoe, recorded in Santa Clara on April 25, 2025.
Syllabus
Building Trust at Scale. How Crusoe Validates Network Infrastructure for AI Workloads with Keysight
Taught by
Tech Field Day