ICML Poster Block Acceleration Without Momentum: On Optimal Stepsizes of Block Gradient Descent for Least-Squares

Poster

Block Acceleration Without Momentum: On Optimal Stepsizes of Block Gradient Descent for Least-Squares

Liangzu Peng · Wotao Yin

[ Abstract ]

Abstract:

Block coordinate descent is a powerful algorithmic template suitable for big data optimization. This template admits a lot of variants including block gradient descent (BGD), which performs gradient descent on a selected block of variables, while keeping other variables fixed. For a very long time, the stepsize for each block has tacitly been set to one divided by the block-wise Lipschitz smoothness constant, imitating the vanilla stepsize rule for gradient descent (GD). However, such a choice for BGD has not yet been able to theoretically justify its empirical superiority over GD, as existing convergence rates for BGD have worse constants than GD in the deterministic cases. To discover such theoretical justification, we set up a simple environment where we consider BGD applied to least-squares with two blocks of variables. We find optimal stepsizes of BGD in closed form, which provably lead to asymptotic convergence rates twice as fast as GD with Polyak's momentum; this means one can accelerate BGD by just tuning stepsizes and without adding any momentum. As a byproduct, we apply our stepsizes to generalized alternating projection between two subspaces, improving the prior convergence rate that was once claimed, slightly inaccurately, to be optimal. Our technical devices include assuming block-wise orthogonality and minimizing the spectral radius of a matrix that controls convergence rates.

Live content is unavailable. Log in and register to view live content