Skip to yearly menu bar Skip to main content


Poster

On the Origins of Linear Representations in Large Language Models

Yibo Jiang · Goutham Rajendran · Pradeep Ravikumar · Bryon Aragam · Victor Veitch


Abstract:

An array of recent works have argued that high-level semantic concepts are encoded "linearly" in the representation space of large language models. In this work, we study the origins of such linear representations. To that end, we introduce a latent variable model to abstract and formalize the concept dynamics of the next token prediction. We use this formalism to prove that linearity arises as a consequence of the loss function and the implicit bias of gradient descent. The theory is further substantiated empirically via experiments.

Live content is unavailable. Log in and register to view live content