Develop a Site Reliability Engineering (SRE) strategy

Microsoft via Microsoft Learn

Go to class Write review

Details

Go to class

Provider

Microsoft Learn
Pricing

Free Online Course
Languages

English
Duration & workload

7 hours 43 minutes
Sessions

On-Demand
Level

Beginner

Found in

Overview

Module 1: Learn about SRE, an engineering discipline that helps you sustainably achieve the appropriate level of reliability in your systems, services, and products.
In this module you will:
- Gain a basic understanding of Site Reliability Engineering (SRE)
- Learn how to get started with this valuable operations practice
Module 2: Respond to incidents and activities in your infrastructure through alerting capabilities in Azure Monitor.
In this module, you'll:
- Configure alerts on events in your Azure resources based on metrics, log events, and activity log events.
- Learn how to use action groups in response to an alert, and how to use alert processing rules to override action groups when necessary.
Module 3: Learn about how to capture trace output from your Azure web apps. View a live log stream and download logs files for offline analysis.
In this module, you will:
- Enable application logging on an Azure Web App
- View live application logging activity with the log streaming service
- Retrieve application log files from an application with Kudu or the Azure CLI
Module 4: Learn how to manage site reliability.
After completing this module, you'll be able to:
- Describe how site reliability engineering (SRE) empowers software developers to own the ongoing daily operation of their applications in production.
- Describe how Application Insights analyzes the performance of your web application and can warn you about potential problems.
- List the processes that you can implement to monitor site reliability.
- Build a "just culture" that balances safety and accountability.
Module 5: Cloud Admin course from Dr. Majd Sakr at Carnegie Mellon University. Discover what cloud elasticity means and different ways to scale your cloud resources.
In this module you will:
- Describe common load patterns and how they drive the need to scale
- Enumerate the strategies and considerations in scaling cloud applications
- Discuss the advantages of auto-scaling and the mechanisms used to achieve it
- Describe the importance of load balancing in cloud applications and enumerate various methods to achieve it
- List the primary benefits of serverless computing and explain the concept of serverless functions
This content is provided in partnership with Dr. Majd Sakr and Carnegie Mellon University.
Module 6: Carnegie Mellon University's Cloud Developer course. Learn how developers write programs that run on the cloud, including how to deploy, be fault-tolerant, load balance, scale, and deal with latency.
In this module, you will:
- Evaluate different considerations when programming applications that run on clouds
- Evaluate different considerations when deploying applications on clouds
- Compare and contrast proactive and reactive measures for fault tolerance in cloud applications
- Describe the importance of load balancing in cloud applications and enumerate various methods to achieve it
- Enumerate the strategies and considerations in scaling cloud applications
- Motivate the case for minimizing tail latency and discuss the various strategies to reduce tail latency
- Describe the strategies to optimize total operational cost of using cloud services
In partnership with Dr. Majd Sakr and Carnegie Mellon University.
Module 7: Learn how to troubleshoot inbound network connectivity for Azure Load Balancer.
In this module, you will:
- Identify common Azure Load Balancer inbound connectivity issues.
- Identify steps to resolve issues when virtual machines aren't responding to health probe.
Module 8: Learn how to monitor the health of your Azure VMs by using Azure Metrics Explorer and metric alerts.
In this module, you will:
- Identify metrics and diagnostic data that you can collect for virtual machines
- Configure monitoring for a virtual machine
- Use monitoring data to diagnose problems

Syllabus

Module 1: Module 1: Introduction to Site Reliability Engineering (SRE)
- Introduction to Site Reliability Engineering
- What is SRE and why does it matter?
- SRE in context
- Key SRE principles and practices: virtuous cycles
- Key SRE principles and practices: The human side of SRE
- Getting started with SRE
- Summary
Module 2: Module 2: Improve incident response with alerting on Azure
- Introduction
- Explore the different alert types that Azure Monitor supports
- Use metric alerts for alerts about performance issues in your Azure environment
- Exercise - Use metric alerts to alert on performance issues in your Azure environment
- Use log alerts to alert on events in your application
- Use activity log alerts to alert on events within your Azure infrastructure
- Use action groups and alert processing rules to send notifications when an alert is fired
- Exercise -Use an activity log alert and an action group to notify users about events in your Azure infrastructure
- Summary
Module 3: Module 3: Capture Web Application Logs with App Service Diagnostics Logging
- Introduction
- Enable and configure App Service application logging
- Exercise - Enable and configure App Service application logging using the Azure portal
- View live application logging with the log streaming service
- Exercise - View live application logging with the log streaming service using Azure CLI
- Retrieve application log files
- Exercise - Retrieve Application Log Files using Azure CLI and Kudu
- Summary
Module 4: Module 4: Manage site reliability
- Introduction
- What is reliability engineering?
- What is Application Insights?
- Perform ongoing tuning to reduce meaningless alerts
- Analyze alerts to establish a baseline
- Blameless postmortems
- Knowledge check
- Summary
Module 5: Module 5: Scale your cloud resources with elasticity
- Introduction
- Compute load patterns
- Scaling compute resources
- Automated scaling on the cloud
- Load balancing
- Serverless computing
- Summary
Module 6: Module 6: Build applications on the cloud
- Introduction
- Programming the cloud
- Deploy applications on the cloud
- Build fault-tolerant cloud services
- Load balancing
- Scale resources
- How to deal with tail latency
- Economics for cloud applications
- Summary
Module 7: Module 7: Troubleshoot inbound network connectivity for Azure Load Balancer
- Introduction
- Troubleshoot Azure Load Balancer
- Diagnose issues by reviewing configurations and metrics
- Exercise - Set up your environment
- Exercise - Identify and resolve inbound network connectivity
- Summary
Module 8: Module 8: Monitor the health of your Azure virtual machine by using Azure Metrics Explorer and metric alerts
- Introduction
- Monitor the health of the virtual machine
- Exercise - Set up a VM with boot diagnostics
- View VM metrics
- Configure the Azure Diagnostics extension
- Exercise - Configure the Azure Diagnostics extension
- Diagnostic data case studies
- Exercise - Use diagnostic data
- Summary

Reviews

Start your review of Develop a Site Reliability Engineering (SRE) strategy

Go to class

Udemy, Coursera, 2U/edX Face Lawsuits Over Meta Pixel Use

Most common

Popular subjects

Popular courses

Develop a Site Reliability Engineering (SRE) strategy

Overview

Syllabus

Tags

Reviews

Udemy, Coursera, 2U/edX Face Lawsuits Over Meta Pixel Use

Tags

Azure Monitor fundamentals

Monitor and back up resources for Azure administrators

Cloud computing basics for developers

Cloud administration basics

Manage cloud resources

Manage security operations in Azure

200+ Online Courses to Prepare for 20 Microsoft Azure Certifications and Exams

Never Stop Learning.