Courses from 1000+ universities
Since December 16th, ChatGPT Search traffic to Class Central has grown threefold, becoming one of the site’s top 5 traffic sources
600 Free Google Certifications
Computer Science
Web Development
Communication Skills
Precalculus: the Mathematics of Numbers, Functions and Equations
Viral Marketing and How to Craft Contagious Content
Fundamentals of Neuroscience, Part 1: The Electrical Properties of the Neuron
Organize and share your learning with Class Central Lists.
View our Lists Showcase
Explore all talks and presentations from SREcon. Dive deep into the latest insights, research, and trends from the world's leading experts.
Explore strategies for building distributed service ownership in software teams, focusing on documentation, telemetry, and empowering teams to drive improvements in system reliability and performance.
Explore common challenges in system reliability and strategies for success. Gain insights into managing complexity and time pressure in production environments.
Explore how modeling-driven techniques and TLA+ can enhance postmortem analysis, uncover root causes, and improve system design in distributed database architectures.
Unraveling a complex Kubernetes incident: from DNS suspicions to kernel-level insights, culminating in a surprising three-line code fix. Learn debugging techniques and unexpected system behaviors.
Strategies for effective incident response coordination, focusing on follower roles and organizational preconditions. Insights to improve communication and collaboration during software outages.
Explore sociotechnical engineering strategies for SREs to impact reliability beyond infrastructure, addressing team struggles, burnout, and priorities to enhance overall system performance.
Uncover the truth behind SLO adoption failures, learn to calculate and prove their value, and understand key differences in measurement methods for better system reliability insights.
Explores attributes affecting engineer confidence in handover communications for software operations, based on research and interviews. Highlights importance of effective information transfer in various scenarios.
Explore the 1979 NORAD nuclear near-miss incident, its causes, and implications for modern distributed systems maintenance and operation. Learn from this historical event to improve current practices.
Explore the transition from incident management to incident analysis, highlighting the distinct skills required and the value of post-incident learning for driving meaningful organizational change.
Explore how SREs can align mental models with system reality using resilience stress testing and decision trees. Learn practical tools for documenting, visualizing, and improving complex software systems.
Insights into SRE management: priorities, decision-making, and career advancement for ICs. Learn to recognize effective leadership and navigate the SRE management landscape.
Discover how Google and Major League Hacking collaborate to create diverse SRE talent through the SRE Fellowship, offering underrepresented groups immersive training and career opportunities in site reliability engineering.
Learn effective alert triage through guided experience in real-world scenarios. Discover how "Alert Triage Hour of Power" fosters camaraderie and system understanding in production environments.
Explore how SREs in government navigated regulatory constraints to rapidly deploy critical software services during the COVID-19 pandemic, showcasing their adaptability and incident response skills.
Get personalized course recommendations, track subjects and courses with reminders, and more.