ICCV Poster MixA: A Mixed Attention approach with Stable Lightweight Linear Attention to enhance Efficiency of Vision Transformers at the Edge

Poster

MixA: A Mixed Attention approach with Stable Lightweight Linear Attention to enhance Efficiency of Vision Transformers at the Edge

Sabbir Ahmed · Jingtao Li · Weiming Zhuang · Chen Chen · Lingjuan Lyu

[ Abstract ]

Abstract: Vision transformers (ViTs) have become widely popular due to their strong performance across various computer vision tasks. However, deploying ViTs on edge devices remains a persistent challenge due to their high computational demands primarily caused by the over use of self-attention layers with quadratic complexity together with the resource-intensive softmax operation. To resolve this challenge, linear self-attention approach has emerged as an efficient alternative. Nonetheless, current linear attention methods experience considerable performance degradation compared to the softmax-based quadratic attention. Hence, we propose MixA, a novel mixed attention approach that enhances efficiency of ViT models while maintaining comparable performance to softmax-based quadratic attention. MixA takes a pretrained ViT model and analyzes the significance of each attention layer, and selectively apply ReLU-based quadratic attention in the critical layers to ensure high model performance. To enhance efficiency, MixA selects the less critical layers and replaces them with our novel ReLU-based linear attention module called \emph{Stable Lightweight Linear Attention} (SteLLA). SteLLA utilizes theoretically motivated normalization terms that improve stability of prior ReLU-based linear attention, resulting in better performance (see Figure 1) while achieving significant speedup compared to softmax based quadratic attention (see Figure 2). Experiments conducted on three benchmark vision tasks show that MixA can significantly improve efficiency of ViT models with competitive performance. Notably, MixA can improve inference speed of DeiT-T model by 22\% on Apple M1 chip with only $\sim$0.1\% accuracy loss.

Live content is unavailable. Log in and register to view live content