Skip to yearly menu bar Skip to main content


Poster

Langevin Policy for Safe Reinforcement Learning

Fenghao Lei · Long Yang · Shiting Wen · Zhixiong Huang · Zhiwang Zhang · Chaoyi Pang


Abstract:

Optimization and sampling based algorithms aretwo branches of methods in machine learning,while existing safe reinforcement learning (RL)algorithms are mainly based on optimization, itis still unclear whether sampling based methodscan lead to desirable performance with safe policy. This paper formulates the Langevin policyfor safe RL, and proposes Langevin Actor-Critic(LAC) to accelerate the process of policy inference. Concretely, instead of parametric policy,the proposed Langevin policy provides a stochastic process that directly infers actions, which isthe numerical solver to the Langevin dynamicof actions on the continuous time. Furthermore,to make Langevin policy practical on RL tasks,the proposed LAC accumulates the transitions induced by Langevin policy and reproduces themwith a generator. Finally, extensive empirical results show the effectiveness and superiority ofLAC on the MuJoCo-based and Safety Gym tasks.

Live content is unavailable. Log in and register to view live content