Skip to yearly menu bar Skip to main content


Poster

Bayesian Design Principles for Offline-to-Online Reinforcement Learning

Hao Hu · yiqin yang · Jianing Ye · Chengjie Wu · Ziqing Mai · Yujing Hu · Tangjie Lv · Changjie Fan · Qianchuan Zhao · Chongjie Zhang


Abstract:

Offline reinforcement learning (RL) is crucial for real-world applications where exploration can be costly or unsafe. However, offline learned policies are often suboptimal, and further online fine-tuning is required. In this paper, we tackle the fundamental dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop. We show that Bayesian design principles are crucial in solving such a dilemma. Instead of adopting optimistic or pessimistic policies, the agent should act in a way that matches its belief in optimal policies. Such a probability-matching agent can avoid a sudden performance drop while still being guaranteed to find the optimal policy. Based on our theoretical findings, we introduce a novel algorithm that outperforms existing methods on various benchmarks, demonstrating the efficacy of our approach. Overall, the proposed approach provides a new perspective on offline-to-online RL that has the potential to enable more effective learning from offline data.

Live content is unavailable. Log in and register to view live content