Skip to yearly menu bar Skip to main content


Poster

Slicing Mutual Information Generalization Bounds for Neural Networks

Kimia Nadjahi · Kristjan Greenewald · Rickard Gabrielsson · Justin Solomon


Abstract: The ability of machine learning (ML) algorithms to generalize well to unseen data has been studied through the lens of information theory, by bounding the generalization error with the input-output mutual information (MI), i.e. the MI between the training data and the learned hypothesis. These bounds have limited empirical use for modern ML applications (e.g. deep learning) since the evaluation of MI is difficult in high-dimensional settings. Motivated by recent reports of significant low-loss compressibility of neural networks, we study the generalization capacity of algorithms which *slice* the parameter space, i.e. train on a random lower-dimensional subspace. We derive information-theoretic bounds on the generalization error in this regime, and discuss an intriguing connection to the $k$-Sliced Mutual Information, an alternative measure of statistical dependence which scales well with dimension. The computational and statistical benefits of our approach allow us to empirically estimate the input-output information of these neural networks and compute their information-theoretic generalization bounds, a task which was previously out of reach.

Live content is unavailable. Log in and register to view live content