Meaningful Predictive Modeling

Overview

Class Central Tips

This course will help us to evaluate and compare the models we have developed in previous courses. So far we have developed techniques for regression and classification, but how low should the error of a classifier be (for example) before we decide that the classifier is "good enough"? Or how do we decide which of two regression algorithms is better? By the end of this course you will be familiar with diagnostic techniques that allow you to evaluate and compare classifiers, as well as performance measures that can be used in different regression and classification scenarios. We will also study the training/validation/test pipeline, which can be used to ensure that the models you develop will generalize well to new (or "unseen") data.

Syllabus

Week 1: Diagnostics for Data

For this first week, we will go over the syllabus, download all course materials, and get your system up and running for the course. We will also introduce the basics of diagnostics for the results of supervised learning.

Week 2: Codebases, Regularization, and Evaluating a Model

This week, we will learn how to create a simple bag of words for analysis. We will also cover regularization and why it matters when building a model. Lastly, we will evaluate a model with regularization, focusing on classifiers.

Week 3: Validation and Pipelines

This week, we will learn about validation and how to implement it in tandem with training and testing. We will also cover how to implement a regularization pipeline in Python and introduce a few guidelines for best practices.

Final Project

In the final week of this course, you will continue building on the project from the first and second courses of Python Data Products for Predictive Analytics with simple predictive machine learning algorithms. Find a dataset, clean it, and perform basic analyses on the data. Evaluate your model, validate your analyses, and make sure you aren't overfitting the data.