Skip to yearly menu bar Skip to main content


Poster

Exploring Intrinsic Dimension for Vision-Language Model Pruning

Hanzhang Wang · Jiawen Zhang · Qingyuan Ma


Abstract:

The intrinsic dimension represents the minimum of coordinates required to depict data on a lower-dimensional manifold within high-dimensional spaces. Meanwhile, network pruning attempt to reduce the complexity of high-dimensional networks while compromising minimal performance. Such a symmetry inspires us to the exploration of intrinsic dimension (ID) as a potential metric for effective pruning. Moreover, in visual-language models, we question whether different modalities of data exist on separate manifolds, thereby suggesting varying complexity and prunability of representations. Specifically, we empirically study the geometry properties of ID variations for large-scale vision-language pre-trained models, and explore the contributions of different modalities towards model prunability.A layer importance metric based on ID is proposed, which yields superior performance for vision-language model pruning. The experimental results demonstrate that visual representations are more sensitive but significantly influence model performance, whereas language representations are robust, thereby offering greater prunability. Up to 90\% of weights in the language modality can be pruned with only a 3.8 drop in the CIDEr metric. Our observations suggest an asymmetric pruning strategy between vision and language with the guiding metric of ID.

Live content is unavailable. Log in and register to view live content