PH525x: Data Analysis for Genomics

Overview

*Note - This is an Archived course* This is a past/archived course. At this time, you can only explore this course in a self-paced fashion. Certain features of this course may not be active, but many people enjoy watching the videos and working with the materials. Make sure to check for reruns of this course. The purpose of this course is to enable students to analyze and interpret data generated by modern genomics technology, specifically microarray data and next generation sequencing data. We will focus on applications common in public health and biomedical research: measuring gene expression differences between populations, associated genomic variants to disease, measuring epigenetic marks such as DNA methylation, and transcription factor binding sites. The course covers the necessary statistical concepts needed to properly design experiments and analyze the high dimensional data produced by these technologies. These include estimation, hypothesis testing, multiple comparison corrections, modeling, linear models, principle component analysis, clustering, nonparametric and Bayesian techniques. Along the way, students will learn to analyze data using the R programming language and several packages from the Bioconductor project. Currently, biomedical research groups around the world are producing more data than they can handle. The training and skills acquired by taking this course will be of significant practical use for these groups. The learning that will take place in this course will allow for greater success in making biological discoveries and improving individual and population health. Before your course starts, try the new edX Demo where you can explore the fun, interactive learning environment and virtual labs. Learn more.