Data science plays an important role in many industries. In facing massive amounts of heterogeneous data, scalable machine learning and data mining algorithms and systems have become extremely important for data scientists. The growth of volume, complexity and speed in data drives the need for scalable data analytic algorithms and systems.
In this course, we study such algorithms and systems in the context of healthcare applications.
In healthcare, large amounts of heterogeneous medical data have become available in various healthcare organizations (payers, providers, pharmaceuticals). This data could be an enabling resource for deriving insights for improving care delivery and reducing waste. The enormity and complexity of these datasets present great challenges in analyses and subsequent applications to a practical clinical environment.
In this course, we introduce the characteristics of medical data and associated data mining challenges in dealing with such data. We cover various algorithms and systems for big data analytics. We focus on studying those big data techniques in the context of concrete healthcare analytic applications such as predictive modeling, computational phenotyping and patient similarity.
Week 1: Intro to Big Data Analytics/Course Overview Week 2: Predictive Modeling Week 3: MapReduce Week 4/5: Classification evaluation metrics/ Classification ensemble methods/ Phenotyping & Clustering Week 6: Spark Week 7: Medical ontology Week 8: Graph analysis Week 9: Dimensionality Reduction Week 10: Patient similairty Week 11: AWS Week 12: AZURE Week 13: Peer Review for Draft Week 14: Final Project (code+presentation+ final paper) Week 15: Final Exam Week