ICML Poster Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking

Poster

Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking

Yongxin Li · Mengyuan Liu · You Wu · Xucheng Wang · Xiangyang Yang · Shuiwang Li

[ Abstract ]

Abstract:

Harnessing transformer-based models, visual tracking has made substantial strides. However, the sluggish performance of current trackers limits their practicality on devices with constrained computational capabilities, especially for real-time unmanned aerial vehicle (UAV) tracking. Addressing this challenge, we introduce AVTrack, an adaptive computation framework tailored to selectively activate transformer blocks for real-time UAV tracking in this work. Our novel Activation Module (AM) dynamically optimizes ViT architecture, selectively engaging relevant components and enhancing inference efficiency without compromising much tracking performance. Moreover, we bolster the effectiveness of ViTs, particularly in addressing challenges arising from extreme changes in viewing angles commonly encountered in UAV tracking, by learning view-invariant representations through mutual information maximization. Extensive experiments on four tracking benchmarks affirm the effectiveness and versatility of our approach, positioning it as a state-of-the-art solution in visual tracking. Code is released at: https://github.com/Tqybu-hans/AVTrack.

Live content is unavailable. Log in and register to view live content