Overview
This conference talk from Conf42 SRE 2025 features Jena Abraham discussing essential Site Reliability Engineering principles for monitoring rapidly growing infrastructure. Learn how to address key SRE challenges in modern environments through a comprehensive reliability framework. The 12-minute presentation covers automated anomaly detection systems, strategies for managing increasingly complex systems, the integration of AI with observability tools, and a phased implementation approach. Explore practical solutions for maintaining reliability at scale as infrastructure growth outpaces traditional monitoring capabilities.
Syllabus
00:00 Introduction to Modern Reliability Framework
00:29 Infrastructure Growth and Challenges
01:31 Key SRE Challenges
02:52 Core Components of the Reliability Framework
03:58 Automated Anomaly Detection
06:34 Managing Complex Systems
09:19 AI and Observability
10:45 Implementation Phases
11:59 Conclusion
Taught by
Conf42