Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Microsoft

Improve your reliability with modern operations practices

Microsoft via Microsoft Learn

Overview

  • Module 1: Discover a map for navigating reliability challenges and sustainably achieving the appropriate level of reliability in your systems, services, and products.
  • By the end of this module, you will be able to:

    • Express why reliability is crucial to your success
    • Describe modern operations practices that offer tools you can use to work on your reliability challenges
    • Explain the Dickerson hierarchy of reliability and the map it provides for approaching reliability challenges
  • Module 2: Learn how to use monitoring to help you sustainably achieve the appropriate level of reliability in your systems, services, and products.
  • In this module you will:

    • Learn how to increase your operational awareness as a precursor to reliability work
    • Expand your understanding of reliability itself
    • Change the way you frame your thinking about monitoring to make it more impactful
    • Gain a basic understanding of the applicable monitoring platform and tools available on Azure
    • Learn a practice from site reliability engineering that can immediately start to create an impact on reliability
    • Learn to craft actionable alerts to make your operational practices sustainable
  • Module 3: Learn the incident response fundamentals necessary to help you sustainably achieve the appropriate level of reliability in your systems, services, and products.
  • In this module you will:

    • Learn the importance of effective incident response
    • Gain an understanding of the lifecycle of an incident so we know just how to apply our efforts
    • Learn the building blocks for constructing an incident response process that allows us to respond with urgency.
    • Begin to track your incidents effectively using Azure DevOps tools.
    • Explore ways to automate your incident tracking for a speedy and consistent response
    • Understand the guidelines around communication that allow incident response to be more efficient
    • Visit some Azure tools that can significantly speed up your remediation times during an incident
  • Module 4: Learn about post-incident reviews, a practice necessary to help you sustainably achieve the appropriate level of reliability in your systems, services, and products.
  • In this module you will:

    • Discover the importance of learning from incidents
    • Understand the aspects of complex systems that make learning from failure important
    • Learn when and how to conduct a post-incident review
    • Understand the purpose and goals of a post-incident review
    • Learn the components that go into a good post-incident review
    • Explore the Azure tools that can assist with getting started with post-incident reviews
    • Become aware of common traps to avoid
    • Identify helpful practices to conduct a better review
  • Module 5: Learn about deployment practices that can help you sustainably achieve the appropriate level of reliability in your systems, services, and products.
  • In this module you will:

    • Learn about what software deployment is and different kinds of deployments we might employ
    • Discover the significant benefits of switching from an "epic deployment" model to a "continuous deployment" model
    • Explore the components of continuous deployment
    • Look deep into pipelines and how they are implemented in Azure Pipelines
    • Learn a number of different strategies for deployment to production that can help us avoid incidents
    • Examine some important best practices that can minimize the risk when rolling out new software or a new version of existing software
  • Module 6: Learn about capacity planning and scaling practices that can help you sustainably achieve the appropriate level of reliability in your systems, services, and products.
  • In this module you will:

    • Learn about scalability and the scalability/reliability relationship
    • Understand the role of capacity planning in preparing for growth
    • Learn basic concepts and fundamental terms related to scaling
    • Eliminate single points of failure
    • Understand the different kinds of growth and how to respond to them
    • Be able to measure capacity in the cloud
    • Catch issues with service limits and quotas before they emerge using Azure tools
    • Understand important steps to take before beginning work on scaling
    • List techniques for making an application more scalable includingdecoupling, queues, in-memory caching and database sharding
    • Learn about the Azure tools that make it possible to take yourapplication or service global

Syllabus

  • Module 1: Improve your reliability with modern operations practices: An introduction
    • Introduction
    • Why reliability matters
    • Modern operations
    • The Dickerson hierarchy of reliability
    • Summary
  • Module 2: Improve your reliability with modern operations practices: Monitoring
    • Introduction
    • Operational awareness
    • Expanding our understanding of reliability
    • Changing the frame
    • Azure monitoring tools
    • Log analytics and KQL queries
    • Service level indicators (SLIs) and service level objectives (SLOs)
    • Actionable alerts
    • Summary
  • Module 3: Improve your reliability with modern operations practices: Incident response
    • Introduction
    • Importance of incident response
    • Characteristics and lifecycle of an incident
    • Foundations of incident response
    • Incident tracking
    • Communication and collaboration
    • Remediation
    • Summary
  • Module 4: Improve your reliability with modern operations practices: Learning from failure
    • Introduction
    • Why learn from incidents?
    • What is a post-incident review?
    • Characteristics and components of a good post-incident review
    • The post-incident review process
    • Common traps to avoid
    • Helpful practices for learning from failure
    • Summary
  • Module 5: Improve your reliability with modern operations practices: Deployment
    • Introduction
    • What is software deployment?
    • The continuous delivery deployment model
    • Test automation and the delivery pipeline
    • Deployment strategies
    • Summary
  • Module 6: Improve your reliability with modern operations practices: Capacity planning and scaling
    • Introduction
    • What is scalability?
    • Prepare for growth
    • Capacity planning considerations
    • Make applications scalable
    • Go global
    • Summary

Reviews

Start your review of Improve your reliability with modern operations practices

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.