Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Harvard University

Statistics and R

Harvard University via edX


This course teaches the R programming language in the context of statistical data and statistical analysis in the life sciences.

We will learn the basics of statistical inference in order to understand and compute p-values and confidence intervals, all while analyzing data with R code. We provide R programming examples in a way that will help make the connection between concepts and implementation. Problem sets requiring R programming will be used to test understanding and ability to implement basic data analyses. We will use visualization techniques to explore new data sets and determine the most appropriate approach. We will describe robust statistical techniques as alternatives when data do not fit assumptions required by the standard approaches. By using R scripts to analyze data, you will learn the basics of conducting reproducible research.

Given the diversity in educational background of our students we have divided the course materials into seven parts. You can take the entire series or individual courses that interest you. If you are a statistician you should consider skipping the first two or three courses, similarly, if you are biologists you should consider skipping some of the introductory biology lectures. Note that the statistics and programming aspects of the class ramp up in difficulty relatively quickly across the first three courses. We start with simple calculations and descriptive statistics. By the third course will be teaching advanced statistical concepts such as hierarchical models and by the fourth advanced software engineering skills, such as parallel computing and reproducible research concepts.

These courses make up two Professional Certificates and are self-paced:

Data Analysis for Life Sciences:

  • PH525.1x: Statistics and R for the Life Sciences
  • PH525.2x: Introduction to Linear Models and Matrix Algebra
  • PH525.3x: Statistical Inference and Modeling for High-throughput Experiments
  • PH525.4x: High-Dimensional Data Analysis

Genomics Data Analysis:

  • PH525.5x: Introduction to Bioconductor
  • PH525.6x: Case Studies in Functional Genomics
  • PH525.7x: Advanced Bioconductor

This class was supported in part by NIH grant R25GM114818.

Taught by

Michael Love and Rafael Irizarry


3.5 rating, based on 21 Class Central reviews

Start your review of Statistics and R

  • Adelyne Chan completed this course, spending 5 hours a week on it and found the course difficulty to be medium.

    A wonderfully presented course which is a part of a larger series of 8 related courses, this course covered the basics of using R and a general overview of statistics. Course material is released every week, but all the quizzes were due about 4 mont…
  • Brandt Pence

    Brandt Pence completed this course, spending 3 hours a week on it and found the course difficulty to be easy.

    (Note, I took this before the reorganization of the courses. I believe the material in the first two-three courses remains the same, so my comments should still be valid here.) This is the first course in the PH525 sequence offered by HarvardX on…
  • Aqsa Anwar
    R programming is a versatile and powerful language for data analysis and statistical computing. Its open-source nature and extensive community support make it a top choice among data scientists and analysts. R offers a vast array of libraries and packages for data manipulation, visualization, and modeling, making it ideal for handling complex datasets. Its syntax, though initially challenging, becomes intuitive with practice. However, it may not be the best choice for large-scale production systems. Overall, R excels in statistical analysis, data exploration, and visualization, making it an essential tool for anyone involved in data science and analytics.
  • Chris Falter

    Chris Falter completed this course.

    Pro: If you watch the videos, read the material, and do the exercises, you will emerge with a working understanding of statistics foundations (normal distribution, Student's t-distribution, Monte Carlo simulations, etc.) and R.

    Con: The instructors were sometimes very sloppy in their explanations; they tended to use hard-to-grasp lingo in the videos and even in the exercises. Between the forums and the exercise explanations, however, I was able to *eventually* understand the exercises that were poorly worded initially.
  • Anonymous

    Anonymous is taking this course right now.

    The instruction videos are very sloppy, and the text book and other resources are not very helpful as well. The exercises have questions on topics that have either not been discussed or very poorly described. Additionally, the language of the questi…
  • Ayse N.

    Ayse N. completed this course.

    The way this course is taught feels pretty sloppy; it is easy to feel lost. They teach one way, and the answers they provide for some exercises is written in a completely different way they have never taught.

    To be able to understand some things, you need to already know a bit about the topic.

    Also the way they name variables is quite cringe-worthy, in some place they name a variable "X", another variable is "x"; since R is case sensitive, no need to worry, right?
  • Profile image for Max Pietsch
    Max Pietsch

    Max Pietsch completed this course, spending 10 hours a week on it and found the course difficulty to be medium.

    I have a background in computer science but none in statistics. I began to get lost in the part about t-tests. This was basic statistical information, so someone with that background would be good to go. To get through the first quarter of the course I had to do a lot of googling for how to work with R, which was fine and helped me learn R.
  • Profile image for Muhammad Khan
    Muhammad Khan

    Muhammad Khan is taking this course right now, spending 2 hours a week on it and found the course difficulty to be medium.

    Im doing Masters in Analytics and this course by Rafael Irizarry is helping me so much in my studies.Amazing course.You get to learn all necessary tools for data analytics in this. The instructor is teaching everything slowly and gradually.Look no where else.First take this course for data science and then go for some other course.Highly recommended .I dont know why some people have given less rating to this course.
  • Anonymous

    Anonymous is taking this course right now.

    I took intro to stats two years ago, now I'm facing econometrics so needed to learn R and brush up on stats. So far (into Week 3) it is a good course for that. There is a fair amount of puzzling things out for yourself, but that's probably a good thing, too. It is a tremendous value for the price ;-)
  • Anonymous

    Anonymous completed this course.

    Not suitable for the beginners.
    Tutor gave the example which will not use in the exercise, so you will get lost easily.
  • Robert Grutza

    Robert Grutza completed this course, spending 3 hours a week on it and found the course difficulty to be medium.

    The instructor was very good. The material was presented in a logical manner. Nice sample R code with explanations. Etc....
  • Anonymous

    Anonymous completed this course.

    Course is not organised well, neither instructor doesn't explain material in depth. Videos are almost useless, many terms used in the them without explaining; for example: hypergeometric distribution
  • Piotr Dziuba

    Piotr Dziuba completed this course.

  • Colin Khein completed this course.

  • Jinwook

    Jinwook completed this course.

  • Davide Madrisan completed this course.

  • Raphael Rivero completed this course.

  • Matteo Ferrara completed this course.

  • Rafael Prados

    Rafael Prados completed this course.

  • James Warren

    James Warren completed this course.

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.