Learning Hadoop

Overview

Learn about Hadoop, key file systems used with Hadoop, its processing engine—MapReduce—and its many libraries and programming tools.

Hadoop is indispensable when it comes to processing big dataâas necessary to understanding your information as servers are to storing it. This course is your introduction to Hadoop; key file systems used with Hadoop; its processing engine, MapReduce, and its many libraries and programming tools. Developer and big-data consultant Lynn Langit shows how to set up a Hadoop development environment, run and optimize MapReduce jobs, code basic queries with Hive and Pig, and build workflows to schedule jobs. Plus, learn about the depth and breadth of available Apache Spark libraries available for use with a Hadoop cluster, as well as options for running machine learning jobs on a Hadoop cluster.

Syllabus

Introduction

Getting started with Hadoop
What you should know before watching this course
Using cloud services

1. Why Change?

Limits of relational database management systems
Introducing CAP (consistency, availability, partitioning)
Understanding big data

2. What Is Hadoop?

Introducing Hadoop
Understanding Hadoop distributions
Understanding the difference between HBase and Hadoop
Exploring the future of Hadoop

3. Understanding Hadoop Core Components

Understanding Java Virtual Machines (JVMs)
Exploring HDFS and other file systems
Introducing Hadoop cluster components
Introducing Hadoop Spark
Exploring the Apache and Cloudera Hadoop distributions
Using the public cloud to host Hadoop: AWS or GCP

4. Setting up Hadoop Development Environment

Understanding the parts and pieces
Hosting Hadoop locally with the Cloudera developer distribution
Setting up the Cloudera Hadoop developer virtual machine
Adding Hadoop libraries to your test environment
Picking your programming language and IDE
Use GCP Dataproc for development

5. Understanding MapReduce 1.0

Understanding MapReduce 1.0
Exploring the components of a MapReduce job
Working with the Hadoop file system
Running a MapReduce job using the console
Reviewing the code for a MapReduce WordCount job
Running and tracking Hadoop jobs

6. Tuning MapReduce

Tuning by physical methods
Tuning a Mapper
Tuning a Reducer
Using a cache for lookups

7. Understanding MapReduce 2.0/YARN

Understanding MapReduce 2.0
Coding a basic WordCount in Java using MapReduce 2.0
Exploring advanced WordCount in Java using MapReduce 2.0

8. Understanding Hive

Introducing Hive and HBase
Understanding Hive
Revisiting WordCount using Hive
Understanding more about HQL query optimization
Using Hive in GCP Dataproc

9. Understanding Pig

Introducing Pig
Understanding Pig
Exploring use cases for Pig
Exploring Pig tools in GCP Dataproc

10. Understanding Workflows and Connectors

Introducing Oozie
Building a workflow with Oozie
Introducing Sqoop
Importing data with Sqoop
Introducing ZooKeeper
Coordinating workflows with ZooKeeper

11. Using Spark

Introducing Apache Spark
Running a Spark job to calculate Pi
Running a Spark job in a Jupyter Notebook

12. Hadoop Today

Understanding machine learning options
Understanding data lakes
Visualizing Hadoop systems

Next Steps

Next steps with Hadoop

Taught by

Lynn Langit

Reviews

Start your review of Learning Hadoop

BloomTech’s Downfall: A Long Time Coming

Most common

Popular subjects

Popular courses

Learning Hadoop

Overview

Syllabus

Taught by

Reviews

BloomTech’s Downfall: A Long Time Coming

Taught by

Hadoop 101

Big Data, Hadoop, and Spark Basics

Hadoop Starter Kit

Hands-on HADOOP Masterclass - Tame the Big Data!

The Ultimate Hands-On Hadoop: Tame your Big Data!

Introduction to Big Data with Spark and Hadoop

10 Best Applied AI & ML Courses

1000 Hours of Free LinkedIn Learning Courses with Free Certification

Never Stop Learning.