Skip to yearly menu bar Skip to main content


Poster

Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

Yuwei Fu · Haichao Zhang · di wu · Wei Xu · Benoit Boulet


Abstract:

In this work, we investigate how to leverage pretrained visual-language models (VLM) for online Reinforcement Learning (RL). In particular, we focus on sparse reward tasks with a predefined textual task description. We first point out the problem of reward misalignment in applying VLM as rewards to RL tasks. As a remedy, we introduce a lightweight fine-tuning method, named Fuzzy VLM reward-aided RL (FuRL), based on reward alignment and relay RL. Experiments on the benchmark tasks showcase the efficacy of the proposed method. Code will be released at: https://github.com/Anonymous/FuRL.

Live content is unavailable. Log in and register to view live content