Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Microsoft

Building Open Source Software (OSS) Analytics Solutions with Azure HDInsight

Microsoft via Microsoft Learn

Overview

Prepare for a new career with $100 off Coursera Plus
Gear up for jobs in high-demand fields: data analytics, digital marketing, and more.
  • Module 1: Introduction to the Open source Analytics Offering
  • At the end of this module, you will understand:

    • What HDInsight is
    • How HDInsight works
    • When to use HDInsight
  • Module 2: Choose the correct HDInsight Configuration to build open source analytics solutions
  • At the end of this module, you will understand:

    • The correct HDInsight configuration options
    • Decision criteria for selecting the correct HDInsight configuration option
    • Analyze a scenario and map it to an HDInsight configuration option
    • Cost Optimization strategies for HDInsight clusters
  • Module 3: Creating and configuring a HDInsight cluster
  • In this module you will:

    • Create an HDInsight Spark Cluster
    • Execute queries on an HDInsight Spark Cluster
    • Monitor an HDInsight Spark Cluster
    • Learn how to fix common provisioning issues
  • Module 4: Run Petabyte level OSS NoSQL databases with HDInsight HBase
    • Introduction
    • Use HDInsight HBase clusters
    • Describe HBase Architecture Patterns
    • Exercise - Provisioning a HDInsight HBase cluster
    • Exercise – Run benchmarks in HBase
    • Understand HBase Best Practices
    • Summary
    • Knowledge Check
  • Module 5: Perform advanced streaming data transformations with Apache Spark and Kafka in Azure HDInsight
  • At the end of this module you will understand:

    • When to use Apache Spark and Kafka with HDInsight
    • How Spark Structured Streaming works
    • The architecture of a Kafka and Spark solution
    • How to provision HDInsight, create a Kafka producer, and stream Kafka data to a Jupyter notebook
    • How to replicate data to a secondary cluster
  • Module 6: Perform Zero ETL analytics with HDInsight Interactive Query
  • In this module you will learn the following:

    • Appropriate scenarios to deploy HDInsight Interactive Query clusters
    • Learn about architectural patterns
    • Deploy a cluster for your real-estate app and query the data
    • Learn how to integrate Apache Spark and Hive LLAP queries using the Hive Warehouse Connector
    • Create a large-scale interactive query dashboard to evaluate real estate values and locations
  • Module 7: Manage enterprise security in HDInsight
    1. Introduction
    2. Describe HDInsight security areas
    3. Implement Network Security
    4. Understand Operating system security
    5. Manage Application/ Middleware security
    6. Implement Data Access security
    7. Knowledge Check
    8. Summary

Syllabus

  • Module 1: Introduction to the Open source Analytics Offering
    • Introduction
    • What is HDInsight?
    • How does HDInsight work
    • When to use HDInsight
    • Knowledge check
    • Summary
  • Module 2: Choose the correct HDInsight Configuration to build open source analytics solutions
    • Introduction
    • HDInsight configuration options
    • Decision criteria for selecting the correct HDInsight configuration option
    • Analyze a scenario and map it to a HDInsight configuration option
    • Cost optimization strategies for HDinsight clusters
    • Knowledge check
    • Summary
  • Module 3: Creating and configuring a HDInsight cluster
    • Introduction
    • Creating an HDInsight cluster
    • Exercise - Create an HDInsight cluster via the Azure portal
    • Opening a Jupyter Notebook on HDInsight Spark cluster
    • Exercise - Execute queries on HDInsight Spark cluster
    • Enable monitoring of HDInsight jobs
    • Common provisioning Issues
    • Exercise - Monitor an HDInsight cluster
    • Summary
    • Knowledge check
  • Module 4: Run Petabyte level OSS NoSQL databases with HDInsight HBase
    • Introduction
    • Describe Apache HBase
    • Explain HDInsight HBase clusters architecture and application patterns
    • Improve the write and read performance of HBase clusters
    • Determine migration and high availability strategies in HDInsight HBase
    • Use Apache Phoenix on HDInsight HBase
    • Determine HDInsight HBase cluster performance
    • Perform benchmarking in HBase
    • Knowledge check
    • Summary
  • Module 5: Perform advanced streaming data transformations with Apache Spark and Kafka in Azure HDInsight
    • Introduction
    • Use HDInsight Spark and Kafka
    • Stream data with Apache Kafka
    • Describe Spark structured streaming
    • Create a Kafka and Spark architecture
    • Exercise - Provision HDInsight to perform advanced streaming data transformations
    • Exercise - Create the Kafka producer
    • Exercise - Stream Kafka data to a Jupyter notebook and window the data
    • Replicate data to a secondary cluster
    • Knowledge check
    • Summary
  • Module 6: Perform Zero ETL analytics with HDInsight Interactive Query
    • Introduction
    • When should you use HDInsight Interactive Query
    • HDInsight interactive queries
    • Exercise - Provision HDInsight to perform adhoc analytics
    • Exercise - Upload and query data in HDInsight
    • Integrate Apache Spark and Hive LLAP queries
    • Create a large scale interactive query dashboard for Evaluating Real Estate Trends
    • Summary
    • Knowledge check
  • Module 7: Manage enterprise security in HDInsight
    • Introduction
    • Describe HDInsight security areas
    • Implement Network security
    • Understand operating system security
    • Manage application/ middleware security
    • Implement data access security
    • Knowledge check
    • Summary

Reviews

Start your review of Building Open Source Software (OSS) Analytics Solutions with Azure HDInsight

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.