ICML Poster Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks

Poster

Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks

Amit Peleg · Matthias Hein

[ Abstract ]

Abstract:

Neural networks typically generalize well when fitting the data perfectly, even though they are heavily overparameterized. Many factors have been pointed out as the reason for this phenomenon, including an implicit bias of stochastic gradient descent (SGD) and a possible simplicity bias arising from the neural network architecture.The goal of this paper is to disentangle influencing factors of generalization stemming from optimization and architectural choices by studying \emph{random} and \emph{SGD-optimized} networks achieving zero training error. We show experimentally in the low sample regime that overparameterization in terms of increasing width is beneficial for generalization and this benefit is due to the bias of SGD and not due to an architectural bias. In contrast, for increasing depth overparameterization is detrimental for generalization but random and SGD-optimized networks behave similarly so that this can be attributed to an architectural bias.

Live content is unavailable. Log in and register to view live content