ICML Poster SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized BatchNorm

Poster

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized BatchNorm

Jialong Guo · Xinghao Chen · Yehui Tang · Yunhe Wang

[ Abstract ]

Abstract: Transformers have become foundational architectures for various fields, including natural language and computer vision. However, it is quite challenging for deploying transformers on resource-constraint devices due to the high computational cost. This paper investigates the computational bottleneck modules of efficient transformer, \ie, normalization layers and attention modules. Layer normalization is commonly used in transformer architectures but is not computational friendly due to statistic calculation during inference. However, replacing Layernorm with more efficient batch normalization in transformer often leads to inferior performance and collapse in training. To address this problem, we propose a novel method named PRepBN to progressively replace LayerNorm with re-parameterized BatchNorm in training. During inference, the proposed PRepBN could be simply re-parameterized into a normal BatchNorm, thus could be fused with linear layers to reduce the latency. Moreover, we propose a simplified linear attention (SLA) module that is simply yet effective to achieve strong performance. Extensive experiments on image classification as well as object detection demonstrate the effectiveness of our proposed method. For example, powered by the proposed methods, our SLAB-Swin obtains $83.6\%$ top-1 accuracy on ImageNet with $16.2$ms latency, which is $2.4$ms less than that of Flatten-Swin with $0.1\%$ higher accuracy.

Live content is unavailable. Log in and register to view live content