Skip to yearly menu bar Skip to main content


Poster

Vision Transformers as Probabilistic Expansion from Learngene

Qiufeng Wang · Xu Yang · Haokun Chen · Xin Geng


Abstract:

Deep learning has advanced through the combination of large datasets and computational power, leading to the development of extensive pre-trained models like Vision Transformers (ViTs). However, these models often assume a one-size-fits-all utility, lacking the ability to initialize models with elastic scales tailored to the resource constraints of specific downstream tasks. To address these issues, we propose \textbf{P}robabilistic \textbf{E}xpansion from Learn\textbf{G}ene (PEG) for mixture sampling and elastic initialization of Vision Transformers. Specifically, PEG utilizes a probabilistic mixture approach to sample Multi-Head Self-Attention layers and Feed-Forward Networks from a large ancestry model into a more compact part termed as learngene. Theoretically, we demonstrate that these learngene can approximate the parameter distribution of the original ancestry model, thereby preserving its significant knowledge. Next, PEG expands the sampled learngene through non-linear mapping, enabling the initialization of descendant models with elastic scales to suit various resource constraints. Our extensive experiments demonstrate the effectiveness of PEG and outperforming traditional initialization strategies.

Live content is unavailable. Log in and register to view live content