Skip to yearly menu bar Skip to main content


Poster

Grokking Happens All the Time and Here is Why

Ahmed Imtiaz Humayun · Randall Balestriero · Richard Baraniuk


Abstract:

Grokking or delayed generalization, is a phenomenon where generalization in a Deep Neural Network (DNN) occurs long after achieving near zero training error. Previous studies have reported the occurrence of grokking in controlled settings, e.g., for transformers trained on algorithmic datasets (power2022grokking), or for DNNs initialized with large-norm parameters (liu2022omnigrok). We instead observe that for a large number of standard and practical settings, e.g., while training a CNN on CIFAR10, or a Resnet on Imagenette, DNNs grok adversarial examples, i.e., adversarial robustness emerges long after interpolation and/or generalization.We present a theoretically motivated explanation behind the emergence of delayed generalization and delayed robustness. We find that both phenomenon are tied, originating from a phase transition in the DNN's input space partition geometry during training. We provide the first evidence that a migration of DNN 'linear regions' occurs, making the function progresively linear around training samples and non-linear around the decision boundary during the latest phase of training. This migration provably induces grokking, as the emergence of a robust partition widens the linear regions around the training samples.

Live content is unavailable. Log in and register to view live content