Skip to yearly menu bar Skip to main content


Poster

Finite Time Regret Bounds for Self-Tuning Regulation

Rahul Singh · Akshay Mete · Avik Kar · P. R. Kumar


Abstract: Reducing the variance of the output of a system with respect to a desired set-point is an important problem in control. It has a large number of engineering applications in the process industries involved in the large-scale production of pharmaceuticals, foods, beverages, oil, gas, paper, chemicals, etc., where the quality of the output product as measured by its variance is critical. Reinforcement learning is necessary since often there is no a-priori model of the dynamic stochastic system. We address the finite-time regret performance of the resulting learning system, called “self-tuning regulation" in this context, which differentiates itself from LQG control since it gives rise to a singular problem where there is no penalty on control input. We obtain the first finite-time regret bounds that capture the initialization and transient performance, as well as their asymptotic behavior. A critical challenge is to prevent large transients soon after initialization that are often experienced by learning schemes. To do so, we introduce a modified version of the certainty equivalence algorithm, which we call PIECE, that clips inputs in addition to utilizing probing inputs for exploration. We show that it has a $Clog T$ bound on the regret after $T$ time-steps for bounded noise, and $C log^3 T$ in the case of sub-Gaussian noise. The simulation results demonstrate the advantage of PIECE over algorithms proposed previously.

Live content is unavailable. Log in and register to view live content