Overview
Class Central Tips
Predictive analytics has a longstanding tradition in medicine. Developing better prediction models is a critical step in the pursuit of improved health care: we need these tools to guide our decision-making on preventive measures, and individualized treatments. In order to effectively use and develop these models, we must understand them better. In this course, you will learn how to make accurate prediction tools, and how to assess their validity. First, we will discuss the role of predictive analytics for prevention, diagnosis, and effectiveness. Then, we look at key concepts such as study design, sample size and overfitting.
Furthermore, we comprehensively discuss important modelling issues such as missing values, non-linear relations and model selection. The importance of the bias-variance tradeoff and its role in prediction is also addressed. Finally, we look at various way to evaluate a model - through performance measures, and by assessing both internal and external validity. We also discuss how to update a model to a specific setting.
Throughout the course, we illustrate the concepts introduced in the lectures using R. You need not install R on your computer to follow the course: you will be able to access R and all the example datasets within the Coursera environment. We do however make references to further packages that you can use for certain type of analyses – feel free to install and use them on your computer.
Furthermore, each module can also contain practice quiz questions. In these, you will pass regardless of whether you provided a right or wrong answer. You will learn the most by first thinking about the answers themselves and then checking your answers with the correct answers and explanations provided.
This course is part of a Master's program Population Health Management at Leiden University (currently in development).
Syllabus
- Welcome to Leiden University
- Welcome to the course Predictive Analytics! We are excited to have you in class and look forward to your contributions to the learning community. To begin, we recommend taking a few minutes to explore the course site. Review the material we will cover each week, and preview the assignments you will need to complete in order to pass the course. Click Discussions to see forums where you can discuss the course material with fellow students taking the class. If you have questions about course content, please post them in the forums to get help from others in the course community. For technical problems with the Coursera platform, visit the Learner Help Center. Good luck as you get started, and we hope you enjoy the course!
- Prediction for prevention, diagnosis, and effectiveness
- In this module, we discuss the role of predictive analytics for prevention, diagnosis, and effectiveness. We begin with a brief introduction to predictive analytics, which we follow by differentiating between population-based and targeted interventions. We then explain why and when it may be beneficial to test for a diagnosis, and how analytic tools can help inform these decisions. Finally, we focus on the balance between benefits and harms of a certain treatment, and how we can predict the benefit for an individual.
- Modeling Concepts
- In this module, we will present some key concepts in prediction modeling. First, we weigh the strengths and weakness of various study designs. Second, we stress the importance of an appropriate sample size for reliable inference. Then, we discuss the issues of overfitting a prediction model, and regression-to-the-mean. Finally, we will guide you through the popular bootstrap procedure, showing how it can be used to assess parameter variability.
- Model development
- In this module, we focus on model development. First, we turn our attention to the missing values problem. We discuss well-known missingness mechanisms, and methods to deal with missing values appropriately. Second, we learn about methods to deal with non-linearity in a dataset. We then address the topic of model selection, focusing on the limitations of traditional stepwise selection procedures. Last, we talk about how introducing bias in exchange for lower variance can improve prediction quality. This can be done by using advanced methods, such as LASSO and Ridge regression.
- Model validation and updating
- In this final module, we learn about assessing the quality of a prediction model. First, we extensively discuss standard performance measures for both binary and continuous outcomes. Second, we explore different ways of validating a prediction model. We look at how to assess both the internal, and the more relevant external validity of a model. Next, we will look at how to update a model and make it applicable to a specific medical setting. We conclude with an interview, where we more broadly discuss the potential of predictive analytics by taking the example of the island of Aruba.
Taught by
Ewout W. Steyerberg and David van Klaveren