This course introduces data analysis methods used in systems biology, bioinformatics, and systems pharmacology research. The course covers methods to process raw data from genome-wide mRNA expression studies (microarrays and RNA-seq) including data normalization, clustering, dimensionality reduction, differential expression, enrichment analysis, and network construction. The course contains practical tutorials for using several bioinformatics tools and setting up data analysis pipelines, also covering the mathematics behind the methods applied by these tools and workflows. The course is mostly appropriate for beginning graduate students and advanced undergraduates majoring in fields such as biology, statistics, physics, chemistry, computer science, biomedical and electrical engineering. The course should be useful for wet- and dry-lab researchers who encounter large datasets in their own research. The course presents software tools developed by the Ma’ayan Laboratory (http://labs.icahn.mssm.edu/maayanlab/) from the Icahn School of Medicine at Mount Sinai in New York City, but also other freely available data analysis and visualization tools. The overarching goal of the course is to enable students to utilize the methods presented in this course for analyzing their own data for their own projects. For those students that do not work in the field, the course introduces research challenges faced in the fields of computational systems biology and systems pharmacology.
Course Overview and Introductions
The 'Introduction to Complex Systems' module discusses complex systems and leads to the idea that a cell can be considered a complex system or a complex agent living in a complex environment just like us. The 'Introduction to Biology for Engineers' module provides an introduction to some central topics in cell and molecular biology for those who do not have the background in the field. This is not a comprehensive coverage of cell and molecular biology. The goal is to provide an entry point to motivate those who are interested in this field, coming from other disciplines, to begin studying biology.
Topological and Network Evolution Models
In the 'Topological and Network Evolution Models' module, we provide several lectures about a historical perspective of network analysis in systems biology. The focus is on in-silico network evolution models. These are simple computational models that, based of few rules, can create networks that have a similar topology to the molecular networks observed in biological systems.
Types of Biological Networks
The 'Types of Biological Networks' module is about the various types of networks that are typically constructed and analyzed in systems biology and systems pharmacology. This lecture ends with the idea of functional association networks (FANs). Following this lecture are lectures that discuss how to construct FANs and how to use these networks for analyzing gene lists.
Data Processing and Identifying Differentially Expressed Genes
This set of lectures in the 'Data Processing and Identifying Differentially Expressed Genes' module first discusses data normalization methods, and then several lectures are devoted to explaining the problem of identifying differentially expressed genes with the focus on understanding the inner workings of a new method developed by the Ma'ayan Laboratory called the Characteristic Direction.
Gene Set Enrichment and Network Analyses
In the 'Gene Set Enrichment and Network Analyses' module the emphasis is on tools developed by the Ma'ayan Laboratory to analyze gene sets. Several tools will be discussed including: Enrichr, GEO2Enrichr, Expression2Kinases and DrugPairSeeker. In addition, one lecture will be devoted to a method we call enrichment vector clustering we developed, and two lectures will describe the popular gene set enrichment analysis (GSEA) method and an improved method we developed called principal angle enrichment analysis (PAEA).
Deep Sequencing Data Processing and Analysis
A set of lectures in the 'Deep Sequencing Data Processing and Analysis' module will cover the basic steps and popular pipelines to analyze RNA-seq and ChIP-seq data going from the raw data to gene lists to figures. These lectures also cover UNIX/Linux commands and some programming elements of R, a popular freely available statistical software. Note that since these lectures were developed and recorded during the Fall of 2013, it is possible that there are better tools that should be used now since the field is rapidly advancing.
Principal Component Analysis, Self-Organizing Maps, Network-Based Clustering and Hierarchical Clustering
This module is devoted to various method of clustering: principal component analysis, self-organizing maps, network-based clustering and hierarchical clustering. The theory behind these methods of analysis are covered in detail, and this is followed by some practical demonstration of the methods for applications using R and MATLAB.
Resources for Data Integration
The lectures in the 'Resources for Data Integration' module are about the various types of networks that are typically constructed and analyzed in systems biology and systems pharmacology. These lectures start with the idea of functional association networks (FANs). Following this lecture are several lectures that discuss how to construct FANs from various resources and how to use these networks for analyzing gene lists as well as to construct a puzzle that can be used to connect genomic data with phenotypic data.
Crowdsourcing: Microtasks and Megatasks
The final set of lectures presents the idea of crowdsourcing. MOOCs provide the opportunity to work together on projects that are difficult to complete alone (microtasks) or compete for implementing the best algorithms to solve hard problems (megatasks). You will have the opportunity to participate in various crowdsourcing projects: microtasks and megatasks. These projects are designed specifically for this course.
The final exam consists of multiple choice questions from topics covered in all of modules of the course. Some of the questions may require you to perform some of the analysis methods you learned throughout the course on new datasets.
I am a biology graduate who is now doing a Masters in Bioinformatics and I found this course extremely helpful as it covers a wide scope of topics in a (relatively) short amount of time, providing necessary background and allowing students to go off…
I am a biology graduate who is now doing a Masters in Bioinformatics and I found this course extremely helpful as it covers a wide scope of topics in a (relatively) short amount of time, providing necessary background and allowing students to go off and explore particular areas of interest. The topics were generally very well explained, but the "necessary but not required" prerequisites are probably quite essentially to getting grips with the course. I also particularly appreciated there being plenty of reference materials which are directly related to the content being provided at the end of every lecture.
The main thing I didn't like about the course was the rather drawn-out demonstrations of, say, how to run specific algorithms. Unfortunately these made some of the videos very long, with little to hold interest as all the instructor was essentially doing was clicking buttons on a screen. A better approach might have been to supply some sample data which students could download and follow along as the instructor went through with the analysis, as it is generally difficult to gauge level of understanding by just watching someone else do the stuff. This could be provided as an optional extra to those interested, just the same way crowdsourcing tasks were provided, so that those who prefer to just sit back and watch can still do so without their grade being affected.