Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.


Model Building and Validation

AT&T via Udacity


This course will teach you how to start from scratch in answering questions about the real world using data. Machine learning happens to be a small part of this process. The model building process involves setting up ways of collecting data, understanding and paying attention to what is important in the data to answer the questions you are asking, finding a statistical, mathematical or a simulation model to gain understanding and make predictions.

All of these things are equally important and model building is a crucial skill to acquire in every field of science. The process stays true to the scientific method, making what you learn through your models useful for gaining an understanding of whatever you are investigating as well as make predictions that hold true to test.

We will take you on a journey through building various models. This process involves asking questions, gathering and manipulating data, building models, and ultimately testing and evaluating them.

Why Take This Course?

Many of you may have already taken a course in machine learning or data science or are familiar with machine learning models.

In this course we will take a more general approach, walking through the questioning, modeling and validation steps of the model building process.

The goal is to get you to practice thinking in depth about a problem and coming up with your own solutions. Many examples we will attempt may not have one correct answer but will require you to work through the problems applying the methods we hope to illustrate throughout this class.


Lesson 1 - Introduction to the QMV Process

Learn about the Question, Modeling, and Validation (QMV) process of data analysis. Understand the basics behind each step and apply the QMV process to analyze on how Udacity employees choose candies!

Lesson 2 - Question Phase

We will drill in on the questioning phase of the QMV process. We’ll teach you how to turn a vague question into a statistical one that can be analyzed with statistics and machine learning. You will also analyze a Twitter dataset and try to predict when a person will tweet next!

Lesson 3 - Modeling Phase

Building upon lesson 2, you will learn how to build rigorous mathematical, statistical, and machine learning models so you can make accurate predictions. You look through the recently released U.S. medicare dataset for anomalous transactions.

Lesson 4 - Validation Phase

So how do you tell if your model is doing well? In this lesson, we will teach you some of the fundamental and important metrics that you can use to grade the performance of the models that you’ve build. You will analyze the AT&T connected cars data set and see if you can tell which driver is which by analyzing their driving patterns.

Final Project - Identify Hacking Attempts from Network Flow Logs

You will create a program that examines log data of net flow traffic, and produces a score, from 1 to 10, describing the degree to which the logs suggest a brute force attack is taking place on a server.

Taught by

Rishi Pravahan and Don Dini

Related Courses


2.0 rating, based on 6 reviews

Start your review of Model Building and Validation

  • Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be medium.

    Model Building and Validation is an advanced data science course provided by AT&T through the Udacity MOOC platform. The course is listed as "advanced" because it assumes prior knowledge of machine learning, statistics, linear algebra and calculus. Despite...
  • Anonymous

    Anonymous completed this course.

    Don Dini seems to be completely clueless in what he does. For example, when in the end of the second lesson he estimates model with k-nearest neighbours, he modifies the problem in a very strange way and generates almost 1.5 million. new datapoints from initial population of several thousand observations . These new datapoints differ from initial only slightly. So KNN works equally great on the test and on the training set, because Don randomly splits them only after generating 1.5 million almost identical datapoints. So it is a kind of regressing sin(x) on cos(x), only somewhat more obscure.
  • Anonymous

    Anonymous is taking this course right now.

    The content of this course sometimes is completely inaccurate from the standpoint of probability theory and statistics. In the second lesson the instructor does some very strange curve fitting to get maximum likelihood estimates for the parameters of something he believes to be probability density function. IMHO, nor his fitting attempts had anything to do with maximum likelihood, neither (by the end of the lesson) the fitted curve estimated any kind of probability density function.
  • Anonymous

    Anonymous completed this course.

    Dont bother. this course is really more about the authors demoing what they can do rather then actually explaining or teaching anything. very little value for the time invested. you'l find better material for this subject on coursera.
  • Fetty Fitriyanti Lubis completed this course.

  • Rafael Prados

    Rafael Prados completed this course.

Never Stop Learning!

Get personalized course recommendations, track subjects and courses with reminders, and more.

Sign up for free