In the final course of the statistical modeling for data science program, learners will study a broad set of more advanced statistical modeling tools. Such tools will include generalized linear models (GLMs), which will provide an introduction to classification (through logistic regression); nonparametric modeling, including kernel estimators, smoothing splines; and semi-parametric generalized additive models (GAMs). Emphasis will be placed on a firm conceptual understanding of these tools. Attention will also be given to ethical issues raised by using complicated statistical models.
This course can be taken for academic credit as part of CU Boulder’s Master of Science in Data Science (MS-DS) degree offered on the Coursera platform. The MS-DS is an interdisciplinary degree that brings together faculty from CU Boulder’s departments of Applied Mathematics, Computer Science, Information Science, and others. With performance-based admissions and no application process, the MS-DS is ideal for individuals with a broad range of undergraduate education and/or professional experience in computer science, information science, mathematics, and statistics. Learn more about the MS-DS program at https://www.coursera.org/degrees/master-of-science-data-science-boulder.
Logo adapted from photo by Vincent Ledvina on Unsplash
An Introduction to Generalized Linear Models Through Binomial Regression
In this module, we will introduce generalized linear models (GLMs) through the study of binomial data. In particular, we will motivate the need for GLMs; introduce the binomial regression model, including the most common binomial link functions; correctly interpret the binomial regression model; and consider various methods for assessing the fit and predictive power of the binomial regression model.
Models for Count Data
In this module, we will consider how to model count data. When the response variable is a count of some phenomenon, and when that count is thought to depend on a set of predictors, we can use Poisson regression as a model. We will describe the Poisson regression in some detail and use Poisson regression on real data. Then, we will describe situations in which Poisson regression is not appropriate, and briefly present solutions to those situations.
Introduction to Nonparametric Regression
In this module, we will introduce the concept of a nonparametric regression model. We will contrast this notion with the parametric models that we have studied so far. Then, we’ll study particular nonparametric regression models: kernel estimators and splines. Finally, we will introduce additive models as a blending of parametric and nonparametric methods.
Introduction to Generalized Additive Models
Some models, such as linear regression, are easily interpretable, but inflexible, in that they don't capture many real-world relationships accurately. Other models, such as neural networks, are quite flexible, but very difficult to interpret. Generalized additive models (GAMs) are a nice balance between flexibility and interpretability. In this module, we will further motivate GAMs, learn the basic mathematics of fitting GAMs, and implementing them on simulated and real data in R.