Model Building and Validation

Overview

This course will teach you how to start from scratch in answering questions about the real world using data. Machine learning happens to be a small part of this process. The model building process involves setting up ways of collecting data, understanding and paying attention to what is important in the data to answer the questions you are asking, finding a statistical, mathematical or a simulation model to gain understanding and make predictions.

All of these things are equally important and model building is a crucial skill to acquire in every field of science. The process stays true to the scientific method, making what you learn through your models useful for gaining an understanding of whatever you are investigating as well as make predictions that hold true to test.

We will take you on a journey through building various models. This process involves asking questions, gathering and manipulating data, building models, and ultimately testing and evaluating them.

Syllabus

Introduction to the QMV Process

Learn about the Question, Modeling, and Validation (QMV) process of data analysis.,Understand the basics behind each step.,Apply the QMV process to analyze on how Udacity employees choose candies!

Question Phase

Learn how to turn a vague question into a statistical one that can be analyzed with statistics and machine learning.,Analyze a Twitter dataset and try to predict when a person will tweet next!

Modeling Phase

Build rigorous mathematical, statistical, and machine learning models to make accurate predictions.,Look through the recently released U.S. medicare dataset for anomalous transactions.

Validation Phase

Learn fundamental metrics to grade the performance of your models.,Analyze the AT&T connected cars data set.,See if you can tell the drivers apart by analyzing their driving patterns.

Identify Hacking Attempts from Network Flow Logs

Create a program that examines log data and scores the likelihood that a brute force attack is taking place on a server.

Taught by

Rishi Pravahan and Don Dini

Reviews

2.0 rating, based on 6 Class Central reviews

Start your review of Model Building and Validation

Gregory J Hamel ( Life Is Study) @greg

Model Building and Validation is an advanced data science course provided by AT&T through the Udacity MOOC platform. The course is listed as "advanced" because it assumes prior knowledge of machine learning, statistics, linear algebra and calculus.…

Model Building and Validation is an advanced data science course provided by AT&T through the Udacity MOOC platform. The course is listed as "advanced" because it assumes prior knowledge of machine learning, statistics, linear algebra and calculus. Despite the stated prerequisites, math doesn't play a large role, so you will still be able to understand most of the content even if your only preparation is Udacity's intro to machine learning. The course spans 4 lessons that detail the process of extracting value from data through questioning, modeling and validation. Lesson 1 is a general introduction to the QMV process with each of the following lessons digging into each component of QMV in more detail. The course somewhat oversells its length as none of the lessons take more than a few hours despite the course being listed at an estimated 8 weeks with 6 hours of study per week. Admittedly, I did not do the final project that involves creating a fraud detection model, which could take a significant chunk of time.

Model Building and Validation follows the same formula as other Udacity courses, with each lesson taking the form of a series of short lecture videos interspersed with quizzes. The lecturers are easy to understand and the video quality is generally good, although the videos and course materials have some glitches that need to be ironed out. I won't grade the course too harshly on bugs, since all courses are buggy at the very beginning, and they will likely be fixed in the near future.

As for the content itself, the simple idea of framing a data analysis as a tree to track and organize the decisions you make along the way is probably the most useful thing you'll take away from this course. The course also does a good job getting students to think about some of the high-level decisions that must be made when conducting a data analysis. The content gets rockier when it delves into specifics after lesson 1, particularly in the models lesson. The lectures occasionally dive too quickly into the low level details of machine learning techniques that students may not have seen before. Additionally the validation section focuses much more on model evaluation metrics like ROC curves, the confusion matrix and derived metrics that fall out of it, than validation itself.

Model Building and Validation is a good course that provides a nice framework for approaching data analysis, but it gets bogged down in some machine learning specifics that don't add much to the overarching theme.

I give Model Building and Validation 3.5 out of 5 stars: Good.
Anonymous

Don Dini seems to be completely clueless in what he does. For example, when in the end of the second lesson he estimates model with k-nearest neighbours, he modifies the problem in a very strange way and generates almost 1.5 million. new datapoints from initial population of several thousand observations . These new datapoints differ from initial only slightly. So KNN works equally great on the test and on the training set, because Don randomly splits them only after generating 1.5 million almost identical datapoints. So it is a kind of regressing sin(x) on cos(x), only somewhat more obscure.
Anonymous

The content of this course sometimes is completely inaccurate from the standpoint of probability theory and statistics. In the second lesson the instructor does some very strange curve fitting to get maximum likelihood estimates for the parameters of something he believes to be probability density function. IMHO, nor his fitting attempts had anything to do with maximum likelihood, neither (by the end of the lesson) the fitted curve estimated any kind of probability density function.
Anonymous

Dont bother. this course is really more about the authors demoing what they can do rather then actually explaining or teaching anything. very little value for the time invested. you'l find better material for this subject on coursera.
Fetty Fitriyanti Lubis
Rafael Prados

Go to class

Discover the Top 75 Free Courses for August

Most common

Popular subjects

Popular courses

Model Building and Validation

Overview

Syllabus

Taught by

Tags

Reviews

Discover the Top 75 Free Courses for August

Taught by

Tags

The Nuts and Bolts of Machine Learning

Data Analysis with Python

Launching Machine Learning: Delivering Operational Success with Gold Standard ML Leadership

Modeling Data in the Tidyverse

Data Analysis with R

Regression Analysis: Simplify Complex Data Relationships

10 Best Data Science Courses

10 Best Applied AI & ML Courses

Massive List of MOOC-based Microcredentials

Never Stop Learning.