Skip to yearly menu bar Skip to main content


Poster

Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention

Aaron Havens · Alexandre Araujo · Huan Zhang · Bin Hu


Abstract: Self-attention has been widely used in various machine learning models, such as vision transformers. The standard dot-product self-attention is arguably the most popular structure, and there is a growing interest in understanding the mathematical properties of such attention mechanisms. This paper presents a fine-grained local sensitivity analysis of the standard dot-product self-attention.Despite the well-known fact that dot-product self-attention is not (globally) Lipschitz, we develop new theoretical analysis of Local Fine-grained Attention Sensitivity (LoFAST) quantifying the effect of input feature perturbations on the attention output. Utilizing mathematical techniques from optimization and matrix theory, our analysis reveals that the local sensitivity of dot-product self-attention to $\ell_2$ perturbations can actually be controlled by several key quantities associated with the attention weight matrices and the unperturbed input.We empirically validate our theoretical findings through several examples, offering new insights for achieving low sensitivity in dot-product self-attention against $\ell_2$ perturbations.

Live content is unavailable. Log in and register to view live content