Skip to yearly menu bar Skip to main content


Poster

Value-Evolutionary-Based Reinforcement Learning

Pengyi Li · Jianye Hao · Hongyao Tang · Yan Zheng · Fazl Barez


Abstract:

Combining Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) for policy search has been proven to improve RL performance. However, previous works largely overlook value-based RL in favor of merging EAs with policy-based RL. This paper introduces Value-Evolutionary-Based Reinforcement Learning (VEB-RL) that focuses on the integration of EAs with value-based RL. The framework maintains a population of value functions instead of policies and leverages negative Temporal Difference error as the fitness metric for evolution. The metric is more sample-efficient for population evaluation than cumulative rewards and is closely associated with the accuracy of the value function approximation. Besides, VEB-RL enables elites of the population to interact with the environment to offer high-quality samples for RL optimization, while the RL value function participates in the population's evolution in each generation.Experiments on MinAtar and Atari demonstrate the superiority of VEB-RL in significantly improving DQN, Rainbow, and SPR.

Live content is unavailable. Log in and register to view live content