Skip to yearly menu bar Skip to main content


Poster

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

Jerry Yao-Chieh Hu · Pei-Hsuan Chang · Haozheng Luo · Hong-Yu Chen · Weijian Li · Wei-Po Wang · Han Liu


Abstract:

We introduce an Outlier-Efficient Modern Hopfield Model (termed OutEffHop) and use it to address the outlier inefficiency problem of gigantic transformer-based models. Our main contribution is a novel modern Hopfield energy function with an internal "no-op classification" mechanism.This design enables the identification of rare memory patterns as no-op outliers and therefore facilitates outlier-efficient associative memory retrievals.Methodologically, we show that the one-step approximation of its memory retrieval dynamics is equivalent to an outlier-efficient attention mechanism (Softmax1). This allows us to debut novel outlier-efficient Hopfield layers for deep learning. Theoretically, the Outlier-Efficient Modern Hopfield Model retains and improves the desirable properties of the standard modern Hopfield models, including fix point convergence and exponential storage capacity. Empirically, we demonstrate the proposed model's efficacy across large-scale transformer-based and Hopfield-based models (including BERT, OPT, and STanHop), benchmarking against state-of-the-art methods including ClippedSoftmax and Gated_Attention.Notably, OutEffHop achieves on average ~20+% reductions in both average kurtosis and maximum infinity norm of model outputs across 3 models.

Live content is unavailable. Log in and register to view live content