Skip to yearly menu bar Skip to main content


Poster

Transferring Knowledge From Large Foundation Models to Small Downstream Models

Shikai Qiu · Boran Han · Danielle Robinson · Shuai Zhang · Yuyang Wang · Andrew Wilson


Abstract:

Transfer learning typically involves loading pre-trained weights as an initialization, followed by fine-tuning on a downstream task. As pre-trained models become ever larger, there is an increasing need for more efficient downstream models, yet this transfer learning procedure commits us to the often massive pre-trained architectures. This procedure also precludes combining multiple pre-trained models that learn complementary information. To address these challenges, we introduce \emph{Adaptive Feature Transfer} (AFT). Instead of transferring weights, AFT operates purely on features, thereby decoupling the choice of the pre-trained model from the smaller downstream model. AFT (1) enables transfer from multiple pre-trained models, even over multiple modalities, with minimal training overhead and no inference overhead; (2) selectively transfers the information in the pre-trained features most relevant for the downstream task, through a prior that favors low mutual information between the downstream inputs and features given the pre-trained features; (3) performs feature transfer in an efficient kernel formulation that prioritizes the most relevant degrees of freedom. Empirically, AFT delivers a substantial boost over alternatives in transferring from state-of-the-art pre-trained models, across diverse vision, language, and multi-modal datasets.

Live content is unavailable. Log in and register to view live content