Skip to yearly menu bar Skip to main content


Poster

Prompting a Pretrained Transformer Can Be a Universal Approximator

Aleksandar Petrov · Phil Torr · Adel Bibi


Abstract:

Despite the widespread adoption of prompting, prompt tuning and prefix-tuning of transformer models, our theoretical understanding of these fine-tuning methods remains limited.A key question is whether one can arbitrarily modify the behavior of pretrained model by prompting or prefix-tuning it. Formally, whether prompting and prefix-tuning a pre-trained model can universally approximate sequence-to-sequence functions.This paper answers in the affirmative and demonstrates that much smaller pre-trained models than previously thought can be universal approximators when prefixed.In fact, the attention mechanism is uniquely suited for universal approximation with prefix-tuning a single attention head being sufficient to approximate any continuous function.Moreover, any sequence-to-sequence function can be approximated by prefixing a transformer with depth linear in the sequence length.Beyond these density-type results, we also offer Jackson-type bounds on the length of the prefix needed to approximate a function to a desired precision.

Live content is unavailable. Log in and register to view live content