Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Treehouse

Cleaning and Preparing Data Course

via Treehouse

This course may be unavailable.

Overview

We rely on data to answer important questions, whether we are trying to make the best business decisions or determine the effectiveness of a new medical treatment. But our analyses are only as accurate as the data we are using, and incorrect or “dirty” data can lead to incorrect conclusions and assumptions. Data preparation, also called “cleaning” or “scrubbing”, is an important part of ensuring our analyses are accurate and useful.

What you'll learn

  • Cleaning and scrubbing data
  • Potential problems within datasets
  • Understanding your dataset
  • Handling bad data

Syllabus

“Clean” and “Dirty” Data

Welcome! In this stage, you will learn about why having a properly cleaned dataset is important and some of the problems you may encounter when cleaning a dataset. we will also take our first look at the data we will be using throughout this course.

Chevron 6 steps
  • What is Data Cleaning?

    3:51

  • Types of Bad Data

    5:18

  • Data Preparation Basics

    7 questions

  • Understanding Your Dataset

    2:15

  • Exploring Your Dataset

    7:15

  • Understanding Your Dataset

    7 questions

Handling Bad Data

Now that we know a little bit about our dataset and the data cleaning process, we will take a closer look at some common issues using our example dataset. Sometimes these issues can be fixed, while other times it’s best to remove the data from our analyses. We can even write programs to help us automate some of the data preparation process, saving time and effort.

Chevron 10 steps
  • Simple Data Issues

    8:37

  • Sensible Column Names and Values

    6:12

  • Fixing or Excluding Data

    3:39

  • Simple Fixes and Exclusions Review

    11 questions

  • Missing Data

    12:31

  • Fixes and Exclusions for Complex Issues

    5 questions

  • Duplicated Data

    9:02

  • Infeasible and Extreme Data

    8:51

  • Automating Data Preparation

    8:08

  • Automating Data Preparation

    5 questions

Selecting Relevant Data

While it may seem like more data is always better, usually we only want to look at the information that’s relevant to the question we are trying to answer. In this stage, we will look at different ways of choosing the most applicable data.

Chevron 6 steps
  • Making Your Dataset Smaller

    2:11

  • Choosing the Right Features

    8:45

  • Selecting the Right Data

    6 questions

  • Automated Feature Selection

    5:55

  • Cleaning and Preparing Data

    1:31

  • Automating Feature Selection

    5 questions

Taught by

Alyssa Batula

Reviews

Start your review of Cleaning and Preparing Data Course

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.