Welcome to Linear Regression in R for Public Health!
Public Health has been defined as “the art and science of preventing disease, prolonging life and promoting health through the organized efforts of society”. Knowing what causes disease and what makes it worse are clearly vital parts of this. This requires the development of statistical models that describe how patient and environmental factors affect our chances of getting ill. This course will show you how to create such models from scratch, beginning with introducing you to the concept of correlation and linear regression before walking you through importing and examining your data, and then showing you how to fit models. Using the example of respiratory disease, these models will describe how patient and other factors affect outcomes such as lung function.
Linear regression is one of a family of regression models, and the other courses in this series will cover two further members. Regression models have many things in common with each other, though the mathematical details differ.
This course will show you how to prepare the data, assess how well the model fits the data, and test its underlying assumptions – vital tasks with any type of regression.
You will use the free and versatile software package R, used by statisticians and data scientists in academia, governments and industry worldwide.
INTRODUCTION TO LINEAR REGRESSION
Before jumping ahead to run a regression model, you need to understand a related concept: correlation. This week you’ll learn what it means and how to generate Pearson’s and Spearman’s correlation coefficients in R to assess the strength of the association between a risk factor or predictor and the patient outcome. Then you’ll be introduced to linear regression and the concept of model assumptions, a key idea underpinning so much of statistical analysis.
Linear Regression in R
You’ll be introduced to the COPD data set that you’ll use throughout the course and will run basic descriptive analyses. You’ll also practise running correlations in R. Next, you’ll see how to run a linear regression model, firstly with one and then with several predictors, and examine whether model assumptions hold.
Multiple Regression and Interaction
Now you’ll see how to extend the linear regression model to include binary and categorical variables as predictors and learn how to check the correlation between predictors. Then you’ll see how predictors can interact with each other and how to incorporate the necessary interaction terms into the model and interpret them. Different kinds of interactions exist and can be challenging to interpret, so we will take it slowly with worked examples and opportunities to practise.
The last part of the course looks at how to build a regression model when you have a choice of what predictors to include in it. It describes commonly used automated procedures for model building and shows you why they are so problematic. Lastly, you’ll have the chance to fit some models using a more defensible and robust approach.
As my first formal foray to R, I highly appreciate the hands-on approach of this course. While I have been aware and using linear regression for various research endeavors prior to this course, the material has provided me a more technical perspective on the advantages, assumptions and limitations of this statistical technique. It is extremely difficult to get decent scores in the quiz without actually trying R, so be prepared to devote more time and effort than you normally would in a MOOC.