Optimizing for Interpretability in Deep Neural Networks - Mike Wu

Overview

This course focuses on optimizing deep neural networks for interpretability. The learning outcomes include understanding methods to make black box deep models more interpretable and regularizing deep models for human comprehension. The course covers topics such as distillation, gradients, adversarial examples, and tree regularization. The teaching method involves a talk followed by interactive discussions and Q&A sessions. The intended audience includes individuals interested in AI, deep learning, and its applications in healthcare, particularly in medical prediction tasks for patients in critical care and with HIV.

Syllabus

Intro
The challenge of interpretability
Lots of different definitions and ideas
Asking the model questions
A conversation with the model
A case for human simulation
Simulatable?
Post-Hoc Analysis
Interpretability as a regularizer
Average Path Length
Problem Setup
Tree Regularization (Overview)
Toy Example for Intuition
Humans are context dependent
Regional Tree Regularization
Example: Three Kinds of Interpretability
MIMIC III Dataset
Evaluation Metrics
Results on MIMIC III
A second application: treatment for HIV
Distilled Decision Tree
Caveats and Gotchas
Regularizing for Interpretability