Overview
The course aims to teach participants how to discover scalability bugs in large distributed systems using the ScaleCheck approach. The learning outcomes include understanding scalability bugs, employing program analysis techniques to find potential causes of bugs, and testing implementation code at real scales on a commodity PC. The course covers topics such as Naive Packing, Single Process Cluster, Per-Node Services, Global Event Driven Architecture, and Colocation Factor. The teaching method involves a presentation of concepts and examples. This course is intended for individuals interested in large-scale testing, scalability, and bug detection in storage systems.
Syllabus
Intro
ScaleCheck A Single Machine Approach for Discovering Scalability Bugs in Large Distributed Systems
An Example: Cassandra Bug #3831
The "Flapping" Bug(s)
Outline introduction
Naive Packing (NP)
Single Process Cluster (SPC) Deploy modes as processes threads in a single process
Per-Node Services Frequent Design pattern
Global Event Driven Architecture (GEDA) One global event handler per service
Finding New Bugs
Colocation Factor
Limitations and Future Work Focus on scale dependent CPUV Processing time
Taught by
USENIX