Explore the intricacies of Transformer embedding spaces in this 48-minute talk by Thimothée Mickus from the Finnish Center for Artificial Intelligence. Delve into the linear structures present in Transformer embeddings, examining how they can be expressed as sums of vector factors due to residual connections. Gain insights into various phenomena observed in Transformer models, including embedding space anisotropy, the impact of BERT's next sentence prediction objective, and the performance of lower layers on lexical semantic tasks. Compare Transformer embeddings to bag-of-word representations and evaluate the importance of multi-head attention modules. Learn from Mickus, a postdoctoral researcher at the University of Helsinki, as he shares his expertise in distributional semantics and neural network-based word vectors.
Linear Structures in Transformer Embedding Spaces
Finnish Center for Artificial Intelligence FCAI via YouTube
Overview
Syllabus
Thimothée Mickus: Linear structures in Transformer Embedding Spaces
Taught by
Finnish Center for Artificial Intelligence FCAI