7–11 Oct 2024
University of Nova Gorica, Lanthieri mansion, Vipava, Slovenia
Europe/Ljubljana timezone

On the representation landscape of large transformer models

9 Oct 2024, 12:20
20m
University of Nova Gorica, Lanthieri mansion, Vipava, Slovenia

University of Nova Gorica, Lanthieri mansion, Vipava, Slovenia

Oral presentation Contributing talks

Speaker

Alberto Cazzaniga (Area Science Park)

Description

Large transformers have been successfully applied to self-supervised data analysis across various data types, including protein sequences, images, and text. However, the understanding of their inner working is still limited. We discuss how, by applying unsupervised learning techniques, we can describe several geometric properties of the representation landscape of these models and how they evolve across their layers. This geometric perspective allows us to point out an explicit strategy to identify the layers that maximize semantic content and to uncover diverse computational strategies that transformers develop to solve specific tasks. Our findings have several applications, from improving protein homology searches to increasing factual-recall in language models, and they offer insight into novel strategies combining in-context-learning and fine-tuning to solve question answering tasks.

References:
(1) L. Valeriani, D. Doimo, F. Cuturello, A. Laio, A. Ansuini, A. Cazzaniga, "The geometry of hidden representations of large transformer models", Advances in Neural Information Processing Systems 36 (2023)
(2) F. Ortu, Z. Jin, Diego Doimo, M. Sachan, A. Cazzaniga, B. Schölkopf, "Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals", Annual Meeting of the Association for Computational Linguistics 62 (2024)
(3) D. Doimo, A. Serra, A. Ansuini, A. Cazzaniga, "The Representation Landscape of Few-Shot Learning and Fine-Tuning in Large Language Models", to appear

Primary author

Alberto Cazzaniga (Area Science Park)

Presentation materials