Poster
Accelerated Policy Gradient for s-rectangular Robust MDPs with Large State Spaces
Ziyi Chen · Heng Huang
[
Abstract
]
Abstract:
Robust Markov decision process (robust MDP) is an important machine learning framework to make a reliable policy that is robust to environmental perturbation. Despite empirical success and popularity of policy gradient methods, existing policy gradient methods require at least iteration complexity $\mathcal{O}(\epsilon^{-4})$ to achieve global convergence on $s$-rectangular robust MDPs, and are limited to deterministic setting and small state space that are impractical in many applications. In this work, we propose an accelerated policy gradient algorithm with iteration complexity $\mathcal{O}(\epsilon^{-3}\ln\epsilon^{-1})$ in the deterministic setting using entropy regularization. Furthermore, we extend this algorithm to stochastic setting and large state space which achieves the sample complexity $\mathcal{O}(\epsilon^{-7}\ln\epsilon^{-1})$. In the meantime, our algorithms are also the first scalable policy gradient methods to entropy-regularized robust MDPs, which provide an important but underexplored machine learning framework.
Live content is unavailable. Log in and register to view live content