Statistics and R

Harvard University via edX

Go to class Write review

Details

Go to class

Provider

edX
Pricing

Free Online Course (Audit)
Languages

English
Certificate

$219.00 Certificate Available
Duration & workload

4 weeks, 2-4 hours a week
Sessions

On-Demand
Level

Intermediate
Subtitles

English, Arabic, German, Spanish, French, Hindi, Indonesian, Portuguese, Swahili, Telugu, Turkish, Chinese

Found in

Part of

Data Analysis for Life Sciences

Overview

This course teaches the R programming language in the context of statistical data and statistical analysis in the life sciences.

We will learn the basics of statistical inference in order to understand and compute p-values and confidence intervals, all while analyzing data with R code. We provide R programming examples in a way that will help make the connection between concepts and implementation. Problem sets requiring R programming will be used to test understanding and ability to implement basic data analyses. We will use visualization techniques to explore new data sets and determine the most appropriate approach. We will describe robust statistical techniques as alternatives when data do not fit assumptions required by the standard approaches. By using R scripts to analyze data, you will learn the basics of conducting reproducible research.

Given the diversity in educational background of our students we have divided the course materials into seven parts. You can take the entire series or individual courses that interest you. If you are a statistician you should consider skipping the first two or three courses, similarly, if you are biologists you should consider skipping some of the introductory biology lectures. Note that the statistics and programming aspects of the class ramp up in difficulty relatively quickly across the first three courses. We start with simple calculations and descriptive statistics. By the third course will be teaching advanced statistical concepts such as hierarchical models and by the fourth advanced software engineering skills, such as parallel computing and reproducible research concepts.

These courses make up two Professional Certificates and are self-paced:

Data Analysis for Life Sciences:

PH525.1x: Statistics and R for the Life Sciences
PH525.2x: Introduction to Linear Models and Matrix Algebra
PH525.3x: Statistical Inference and Modeling for High-throughput Experiments
PH525.4x: High-Dimensional Data Analysis

Genomics Data Analysis:

PH525.5x: Introduction to Bioconductor
PH525.6x: Case Studies in Functional Genomics
PH525.7x: Advanced Bioconductor

This class was supported in part by NIH grant R25GM114818.

Taught by

Michael Love and Rafael Irizarry

Reviews

3.5 rating, based on 20 Class Central reviews

4 rating at edX based on 26 ratings

Start your review of Statistics and R

Adelyne Chan @adelyne

A wonderfully presented course which is a part of a larger series of 8 related courses, this course covered the basics of using R and a general overview of statistics. Course material is released every week, but all the quizzes were due about 4 mont…

A wonderfully presented course which is a part of a larger series of 8 related courses, this course covered the basics of using R and a general overview of statistics. Course material is released every week, but all the quizzes were due about 4 months after the course actually started, which allows flexibility for students. I also liked the way in which R MarkDown scripts were provided for each lecture, and working through the scripts (in text form) really reinforced the concepts covered in the video lectures. The exercises usually built on these as well, although I felt that the wording for some of the questions were quite dubious - often a lot of the time I spent on this course was figuring out what the question was asking rather than actually working on getting a solution!
Brandt Pence

(Note, I took this before the reorganization of the courses. I believe the material in the first two-three courses remains the same, so my comments should still be valid here.) This is the first course in the PH525 sequence offered by HarvardX on…

(Note, I took this before the reorganization of the courses. I believe the material in the first two-three courses remains the same, so my comments should still be valid here.)

This is the first course in the PH525 sequence offered by HarvardX on the EdX platform. The sequence is taught by Rafael Irizarry, a noted computational biologist at Harvard and the Dana Farber Cancer Center. The course offers a relatively gentle introduction to biostatistics, and there's little emphasis on genomic analyses here. Topics that are covered include probability, the normal distribution, some inferential statistics (T-tests, confidence intervals, power calculations, association tests, and simultation), and exploratory data analysis.

The introduction to R is rather cursory, and I have to imagine that the homework assignments might be challenging for those unfamiliar with the language, although there is a fair bit of handholding for the most difficult parts. For those that have taken R Programming, as I had, this course will seem very easy. I took it during a self-paced period and finished the entire course in a little more than three days, working only a few hours a night and a bit here and there during free periods at work, and I don't think I spent much more than 10-20 minutes on any of the programming problems.

There is some value to be had here even for those with experience in R, though. The basic introduction of actual statistical tests in this course is likely to give students taking the statistical inference courses in the Data Science and Genomic Data Science specializations a bit of a head start. The section on dplyr, a powerful method of splitting datasets and performing operations on their contents in a more intuitive way than in base R, is also reasonably good. Additional follow-up courses are available for matrix operations and advanced statistics.

Overall, four stars. The actual instruction in programming in R is a bit slim here, but for those with experience with the language but with little experience on the statistics side of R (which would describe most everyone currently taking or having recently taken R Programming), there is a lot of value here for little effort. The EdX platform is not as nice as Coursera's, especially when it comes to the discussion boards, but this doesn't detract much from the course.
Chris Falter

Pro: If you watch the videos, read the material, and do the exercises, you will emerge with a working understanding of statistics foundations (normal distribution, Student's t-distribution, Monte Carlo simulations, etc.) and R.

Con: The instructors were sometimes very sloppy in their explanations; they tended to use hard-to-grasp lingo in the videos and even in the exercises. Between the forums and the exercise explanations, however, I was able to *eventually* understand the exercises that were poorly worded initially.
Anonymous

The instruction videos are very sloppy, and the text book and other resources are not very helpful as well. The exercises have questions on topics that have either not been discussed or very poorly described. Additionally, the language of the questi…

The instruction videos are very sloppy, and the text book and other resources are not very helpful as well. The exercises have questions on topics that have either not been discussed or very poorly described. Additionally, the language of the questions in the exercises is sometimes very ambiguous and left me confused about what was being asked or wondering if there was a question in there at all. I don't think that I'll be continuing with this course. Money down the drain I guess. Do not waste your time by taking this course!

Note: I hold in doctorate degree in biological sciences and have extensive experience in carrying out statistical analysis using both R and Python. I wanted to take this course to have some sort of certification.
Ayse N.

The way this course is taught feels pretty sloppy; it is easy to feel lost. They teach one way, and the answers they provide for some exercises is written in a completely different way they have never taught.

To be able to understand some things, you need to already know a bit about the topic.

Also the way they name variables is quite cringe-worthy, in some place they name a variable "X", another variable is "x"; since R is case sensitive, no need to worry, right?
Max Pietsch

I have a background in computer science but none in statistics. I began to get lost in the part about t-tests. This was basic statistical information, so someone with that background would be good to go. To get through the first quarter of the course I had to do a lot of googling for how to work with R, which was fine and helped me learn R.
Muhammad Khan

Im doing Masters in Analytics and this course by Rafael Irizarry is helping me so much in my studies.Amazing course.You get to learn all necessary tools for data analytics in this. The instructor is teaching everything slowly and gradually.Look no where else.First take this course for data science and then go for some other course.Highly recommended .I dont know why some people have given less rating to this course.
Anonymous

I took intro to stats two years ago, now I'm facing econometrics so needed to learn R and brush up on stats. So far (into Week 3) it is a good course for that. There is a fair amount of puzzling things out for yourself, but that's probably a good thing, too. It is a tremendous value for the price ;-)
Anonymous

Not suitable for the beginners.
Tutor gave the example which will not use in the exercise, so you will get lost easily.
Robert Grutza

The instructor was very good. The material was presented in a logical manner. Nice sample R code with explanations. Etc....
Anonymous

Course is not organised well, neither instructor doesn't explain material in depth. Videos are almost useless, many terms used in the them without explaining; for example: hypergeometric distribution
Piotr Dziuba
Colin Khein
Jinwook
Davide Madrisan @dmadrisan
Raphael Rivero
Matteo Ferrara @matteo
Rafael Prados
James Warren
Alun Ap Rhisiart