In this culminating project, you will deploy the tools and techniques that you've mastered over the course of the specialization. You'll work with a real data set to perform analyses and prepare a report of your findings.
In this first week, we'll introduce the project and get you oriented to the tasks that you'll be performing over the next several weeks.
Introduction to the Dataset - Andrew Jaffe
This week, we'll really dig into the dataset by providing an introduction from Andrew Jaffe, the lead scientist on the analysis. You should also be looking ahead to Task 2, which is due in Week 4; the alignment will take a long time to perform, so you should start early.
Understand the Problem
The purpose of genomic data science is to answer fundamental questions in biology. Before starting on the data analysis process, the first step is always to understand the scientific question you are trying to answer. Don't forget to stay on top of the alignment task due in Week 4; it will take a long time to accomplish and shouldn't be put off.
Once you have understood the problem, the next step is to obtain the raw data so that you can perform your analysis.
QC the Alignment
Now you have aligned the data, the next step is to do some quality control to make sure that the data are in good shape.
Get Feature Counts
Now that you have performed alignment and quality control, the next step is to calculate the abundance of every gene in every sample.
After summarizing your genomic data the next step is to load the data into R for analysis with Bioconductor.
The next step is to perform a statistical analysis to detect genes that are differentially expressed.
Gene Set Analysis
In task 6, we have identified genes differentially expressed between fetal and adult brain. Now we will examine these results in a wider context.
Describe Your Analysis
The next step is to document your work. One of the major issues in genomic data science is that there are so many steps in the process. If these steps are not documented well the result can be major problems.