Welcome to Introduction to Statistics & Data Analysis in Public Health!
This course will teach you the core building blocks of statistical analysis - types of variables, common distributions, hypothesis testing - but, more than that, it will enable you to take a data set you've never seen before, describe its keys features, get to know its strengths and quirks, run some vital basic analyses and then formulate and test hypotheses based on means and proportions. You'll then have a solid grounding to move on to more sophisticated analysis and take the other courses in the series. You'll learn the popular, flexible and completely free software R, used by statistics and machine learning practitioners everywhere. It's hands-on, so you'll first learn about how to phrase a testable hypothesis via examples of medical research as reported by the media. Then you'll work through a data set on fruit and vegetable eating habits: data that are realistically messy, because that's what public health data sets are like in reality. There will be mini-quizzes with feedback along the way to check your understanding. The course will sharpen your ability to think critically and not take things for granted: in this age of uncontrolled algorithms and fake news, these skills are more important than ever.
Some formulae are given to aid understanding, but this is not one of those courses where you need a mathematics degree to follow it. You will need only basic numeracy (for example, we will not use calculus) and familiarity with graphical and tabular ways of presenting results. No knowledge of R or programming is assumed.
Introduction to Statistics in Public Health
-Statistics has played a critical role of in public health research and practice, and you’ll start by looking at two examples: one from eighteenth century London and the other by the United Nations. The first task in carrying out a research study is to define the research question and express it as a testable hypothesis. With examples from the media, you’ll see what does and does not work in this regard, giving you a chance to define a research question from some real news stories.
Types of Variables, Common Distributions and Sampling
-This module will introduce you to some of the key building blocks of knowledge in statistical analysis: types of variables, common distributions and sampling. You’ll see the difference between “well-behaved” data distributions, such as the normal and the Poisson, and real-world ones that are common in public health data sets.
Introduction to R and RStudio
-Now it’s time to get started with the powerful and completely free statistical software R and its popular interface RStudio. With the example of fruit and vegetable consumption, you’ll learn how to download R, import the data set and run essential descriptive analyses to get to know the variables.
Hypothesis Testing in R
-Having learned how to define a research question and testable hypothesis earlier in the course, you’ll learn how to apply hypothesis testing in R and interpret the result. As all medical knowledge is derived from a sample of patients, random and other kinds of variation mean that what you measure on that sample, such as the average body mass index, is not necessarily the same as in the population as a whole. It’s essential that you incorporate this uncertainty in your estimate of average BMI when presenting it. This involves the calculation of a p value and confidence interval, fundamental concepts in statistical analysis. You’ll see how to do this for averages and proportions.