Overview
The course covers model interpretation in the context of Neural Networks for Natural Language Processing. The learning outcomes include understanding the importance of interpretability, exploring different explanation techniques such as gradient-based importance scores and extractive rationale generation. The course teaches skills in probing sentence embeddings for linguistic properties and evaluating model interpretations. The teaching method is through a lecture format. The intended audience for this course is individuals interested in neural networks, natural language processing, and model interpretation.
Syllabus
Intro
Why interpretability?
What is interpretability?
Two broad themes
Source Syntax in NMT
Why neural translations are the right length?
Fine grained analysis of sentence embeddings
What you can cram into a single vector: Probing sentence embeddings for linguistic properties
Issues with probing
Minimum Description Length (MDL) Probes
How to evaluate?
Explanation Techniques: gradient based importance scores
Explanation Technique: Extractive Rationale Generation
Taught by
Graham Neubig