Skip to yearly menu bar Skip to main content


Poster

Error Feedback Can Accurately Compress Preconditioners

Ionut-Vlad Modoranu · Aleksei Kalinov · Eldar Kurtic · Elias Frantar · Dan Alistarh


Abstract:

Leveraging second-order information about theloss at the scale of deep networks is one of themain lines of approach for improving the perfor-mance of current optimizers for deep learning.Yet, existing approaches for accurate full-matrixpreconditioning, such as Full-Matrix Adagrad(GGT) or Matrix-Free Approximate Curvature(M-FAC) suffer from massive storage costs whenapplied even to small-scale models, as they muststore a sliding window of gradients, whose mem-ory requirements are multiplicative in the modeldimension. In this paper, we address this issue viaa novel and efficient error-feedback technique thatcan be applied to compress preconditioners by upto two orders of magnitude in practice, withoutloss of convergence. Specifically, our approachcompresses the gradient information via sparsi-fication or low-rank compression before it is fedinto the preconditioner, feeding the compressionerror back into future iterations. Extensive exper-iments on deep neural networks show that thisapproach can compress full-matrix precondition-ers to up to 99% sparsity without accuracy loss,effectively removing the memory overhead of full-matrix preconditioners such as GGT and M-FAC.

Live content is unavailable. Log in and register to view live content