Skip to yearly menu bar Skip to main content


Poster

In-Context Learning on Function Classes Unveiled for Transformers

Zhijie Wang · Bo Jiang · Shuai Li


Abstract:

Transformer based neural sequence models exhibit remarkable ability to do in-context learning. Given some training examples, a pre-trained model can make accurate predictions on a novel input. This paper studies why transformers can learn different types of function classes in context. We first show by construction that there exists a family of transformers (with different activation functions) that implement approximate gradient descent on parameters of neural networks, and provide an upper bound for number of heads, hidden dimension, and number of layers of the transformer. We also show that a transformer can learn linear functions, indicator of unit ball and smooth functions in-context by learning neural networks that approximate them. The above instances mainly focus on a transformer pre-trained on single tasks. We also prove that when pre-trained on two tasks: linear regression and classification, a transformer can make accurate predictions on both tasks simultaneously. Our results move beyond linearity in terms of in-context learning instances and provide a comprehensive understanding of why transformers can learn many types of function classes through the bridge of neural networks.

Live content is unavailable. Log in and register to view live content