Skip to yearly menu bar Skip to main content


Poster

Towards Scalable and Versatile Hyper-Representation Learning

Konstantin Schürholt · Michael Mahoney · Damian Borth


Abstract:

Learning representations of well-trained neural network models holds the promise to go beyond providing high-quality predictions to understanding other aspects of those models, including their robustness, safety, etc. Previous work faced limitations when processing larger networks or were task-specific to either discriminative or generative tasks. This paper introduces SANE , which overcomes these challenges by learning a task-agnostic representation of neural networks that is not only scalable to much larger model sizes but that also shows capabilities beyond a single task. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights, thus allowing one to embed a potentially large neural network as a set of tokens into the learned representation space. This technique reveals global model information across layer-wise components, and it is able to sequentially generate unseen neural network models, an aspect previously unattainable with previous hyper-representation learning methods. We evaluate SANE on multiple downstream tasks across multiple models zoos, representing seven computer vision datasets. Our findings demonstrate that SANE not only matches but also exceeds state-of-the-art performance on several weight representation learning benchmarks, particularly in initialization and transfer learning tasks for larger models like ResNets.

Live content is unavailable. Log in and register to view live content