Skip to yearly menu bar Skip to main content


Poster

A Statistical Theory of Regularization-Based Continual Learning

Xuyang Zhao · Huiyuan Wang · Weiran Huang · Wei Lin


Abstract: We give a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks, with emphasis on how different regularization terms affect the model performance.We first derive the convergence rate for the oracle estimator obtained as if all data were available simultaneously. Next,we consider a family of generalized $\ell_2$-regularized algorithms indexed by matrix-valued hyperparameters, which includes the minimum norm estimator \citep{lin2023theory} and the continual ridge regression \citep{li2023fixed} as special cases. As more tasks are introduced, we derive an iterative update formula for the estimation error of generalized $\ell_2$-regularized estimators, based on which we determine the hyperparameter resulting in the optimal algorithm. Interestingly, the choice of hyperparameter can harmoniously balance the trade-off between the backward and forward knowledge transfer and adjust for distribution heterogeneity. Moreover, the estimation error of the optimal algorithm is further derived explicitly, which is of the same order as that of the oracle estimator. In contrast, our analysis of the minimax lower bound for the minimum norm estimator shows its suboptimality. A byproduct of our theoretical analysis is the equivalence betweenearly stopping and generalized $\ell_2$-regularization instead of conventional ridge regression in continuallearning, which can be of independent interest.Finally, we conduct experiments to complement our theory.

Live content is unavailable. Log in and register to view live content