This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner. Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Reproducibility allows for people to focus on the actual content of a data analysis, rather than on superficial details reported in a written summary. In addition, reproducibility makes an analysis more useful to others because the data and code that actually conducted the analysis are available. This course will focus on literate statistical analysis tools which allow one to publish data analyses in a single document that allows others to easily execute the same analysis to obtain the same results.
Week 1: Concepts, Ideas, & Structure
This week will cover the basic ideas of reproducible research since they may be unfamiliar to some of you. We also cover structuring and organizing a data analysis to help make it more reproducible. I recommend that you watch the videos in the order that they are listed on the web page, but watching the videos out of order isn't going to ruin the story.
Week 2: Markdown & knitr
This week we cover some of the core tools for developing reproducible documents. We cover the literate programming tool knitr and show how to integrate it with Markdown to publish reproducible web documents. We also introduce the first peer assessment which will require you to write up a reproducible data analysis using knitr.
Week 3: Reproducible Research Checklist & Evidence-based Data Analysis
This week covers what one could call a basic check list for ensuring that a data analysis is reproducible. While it's not absolutely sufficient to follow the check list, it provides a necessary minimum standard that would be applicable to almost any area of analysis.
Week 4: Case Studies & Commentaries
This week there are two
case studies involving the importance of reproducibility in science for you to watch.
The first 2.5 weeks of lecture material is great. It provides a well-organized overview of how to create reproducible research in R using R markdown and the knitr package, taking plenty of time to talk about best practices. Thankfully, Roger Peng has...
The first 2.5 weeks of lecture material is great. It provides a well-organized overview of how to create reproducible research in R using R markdown and the knitr package, taking plenty of time to talk about best practices. Thankfully, Roger Peng has added in a little box with his face in at as he talks over his slides for many of his videos, which makes the content a lot more engaging than it is in some of the other John Hopkins courses that only have voiceovers.
The final 1.5 weeks of lecture video material is not as useful or engaging and seems a bit lazy in that week 4 takes the form of recordings of lectures given sometime in the past. The videos in second half of week 3 only have voiceovers and they have an echo to them that makes them hard to listen to.
Brandt Pence completed this course, spending 3 hours a week on it and found the course difficulty to be easy.
Reproducible Research is the fifth course in the Data Science specialization, and the last course in what could reasonably be considered the basic R introduction portion of the series. Following this course, students move into Statistical Inference, Regression...
Reproducible Research is the fifth course in the Data Science specialization, and the last course in what could reasonably be considered the basic R introduction portion of the series. Following this course, students move into Statistical Inference, Regression Models, and Practical Machine Learning, courses which are more about analytical techniques than basic programming skills.
The idea behind reproducible research is to inform students about the reproducibility crisis in science and to give them tools to make their analysis reproducible. This is something that has long had importance in programming but has only recently been given much credence in experimental sciences, so this course is relatively timely for those of us in the latter fields. The course covers R Markdown and knitr as a method of producing reports with integrated analyses, and also covers RPubs for communicating results. Overall this is probably the most useful and important course in the first part of the specialization. The course itself is easy, although the two projects (week 2 and week 4) can be very time-consuming depending on how much effort you choose to expend. I went for the option of providing the minimum to meet the stated requirements, and I still received 100% on my peer feedback, although this strategy can be risky, so proceed with caution if you choose to do this.
Overall, four stars. The best course in the first half of the specialization, and it gives a good overview of strategies to disseminate your analyses and results in an understandable and reproducible way. I took this concurrently with Exploratory Data Analysis, but some of the material from that course is useful here, so make sure you have enough time to complete both if you choose this route.
The course is a part of very good 'data science with R' program (don't know current name cause it changes) available at Coursera.
The program is quite massive, it contains about 8 courses but is really thorough and well presented. It is designed with even complete beginners in mind, so may start it without any prior knowledge.
Anonymous completed this course.
Not much content. Only introduced and taught one main topic: knitr package in R. Much of course spent repetitively advocating for reproducible research with case studies and peer reviewed assignments. Second peer reviewed assignment was essentially the same as the first in terms of learning new techniques. Most of my time completing the course (I spent on average 6 hours per week) was in trying to clean and organize the data, and getting unfamiliar R techniques to work.
The instructor's personal preference for knitr over Sweave may be contrary to most statistician's preferences.
Jason Michael Cherry completed this course, spending 2 hours a week on it and found the course difficulty to be medium.
The course was solid, and gave a good overview on the why and how of making research reproducible. There's an overemphasis on doing work in R Markdown, but the concepts are generally applicable. Background knowledge in using R and running basic stats is necessary, as the course assumes you already have that going in.
Huy completed this course, spending 4 hours a week on it and found the course difficulty to be easy.