Cloud computing systems today, whether open-source or used inside companies, are built using a common set of core techniques, algorithms, and design philosophies – all centered around distributed systems. Learn about such fundamental distributed computing "concepts" for cloud computing.
Some of these concepts include: clouds, MapReduce, key-value/NoSQL stores, classical distributed algorithms, widely-used distributed algorithms, scalability, trending areas, and much, much more!
Know how these systems work from the inside out. Get your hands dirty using these concepts with provided homework exercises. In the programming assignments, implement some of these concepts in template code (programs) provided in the C++ programming language. Prior experience with C++ is required.
The course also features interviews with leading researchers and managers, from both industry and academia.
This course builds on the material covered in the Cloud Computing Concepts, Part 1 course.
Week 1: Course Orientation and Classical Distributed Algorithms Continued
Lesson 1: To coordinate machines in a distributed system, this module first looks at classical algorithms for electing a leader, including the Ring algorithm and Bully algorithm. We also cover how Google’s Chubby and Apache Zookeeper solve leader election. Lesson 2: This module covers solutions to the problem of mutual exclusion, which is important for correctness in distributed systems with shared resources. We cover classical algorithms, including Ricart-Agrawala’s algorithm and Maekawa’s algorithm. We also cover Google’s Chubby support for mutual exclusion.
Week 2: Concurrency and Replication Control
Lesson 1: Transactions are an important component of many cloud systems today. This module presents building blocks to ensure transactions work as intended, from Remote Procedure Calls (RPCs), to serial equivalence for transactions, to optimistic and pessimistic approaches to concurrency control, to deadlock avoidance/prevention. Lesson 2: This module covers how replication – maintaining copies of the same data at different locations – is used to provide many nines of availability in distributed systems, as well as different techniques for replication and for ensuring transactions commit correctly in spite of replication.
Week 3: Emerging Paradigms
Lesson 1: We study the emerging area of stream processing, touching on key design aspects of Apache Storm. Lesson 2: We study how enormous graphs can be processed in clouds. Lesson 3: We study various types of networks/graphs that are both natural and artificial, and their surprising commonalities. Lesson 4: This module presents classical scheduling algorithms that have been used in operating systems since the inception of computers. We then cover two popular scheduling algorithms for Hadoop.
Week 4: Classical Systems
Lesson 1: When files and directories are stored/accessed over the network, it is called a distributed file system. This module covers the working of distributed file systems like NFS and AFS. Lesson 2: This module covers Distributed Shared Memory systems, their techniques, and pros/cons. Lesson 3: This module looks at the area of sensor networks, starting from what’s inside a sensor mote and how networks of them work.
Week 5: Real-Life Behaviors
Lesson 1: This module is a primer on basic security concepts, not just applied to distributed systems, but also more generally. We study various policies and mechanisms, including encryption, authentication, and authorization. Lesson 2: This module presents case studies of real datacenter outages, and attempts to draw lessons on how to prevent them and how to better prepare for them.