Skip to yearly menu bar Skip to main content


Poster

Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning

Dake Zhang · Boxiang Lyu · Shuang Qiu · Mladen Kolar · Tong Zhang


Abstract: We study risk-sensitive reinforcement learning (RL), an essential field due to its ability to enhance decision-making in problems where it is crucial to manage uncertainty and minimize potential adverse outcomes. Our work focuses on the so-called entropic risk measure. While existing literature primarily investigates the online setting, there remains a gap in understanding how to efficiently derive a near-optimal policy based on this risk measure using only a previously collected dataset. We center on the linear Markov Decision Process (MDP) setting, a well-regarded theoretical framework that has yet to be examined from a risk-sensitive standpoint. In response, we introduce two provably sample-efficient algorithms. We begin by presenting a risk-sensitive pessimistic value iteration algorithm, offering a tight analysis by leveraging the structure of the risk-sensitive performance measure. To further improve the bounds obtained, we propose another pessimistic algorithm that utilizes variance information and reference-advantage decomposition, effectively sharpening the dependence on the space dimension $d$ and improving the risk-sensitivity factor. To the best of our knowledge, we obtain the first provably efficient risk-sensitive offline RL algorithms.

Live content is unavailable. Log in and register to view live content