Skip to yearly menu bar Skip to main content


Poster

Scalable Safe Policy Improvement for Factored Multi-Agent MDPs

Federico Bianchi · Edoardo Zorzi · Alberto Castellini · Thiago Simão · Matthijs T. J. Spaan · Alessandro Farinelli


Abstract:

In this work, we focus on safe policy improvement in multi-agent domains where current state-of-the-art methods cannot be effectively applied. We consider recent results using Monte Carlo Tree Search for Safe Policy Improvement with Baseline Bootstrapping and propose a novel algorithm that scales this approach to multi-agent domains, exploiting the factorization of the transition and value functions. Given a centralized behavior policy and a dataset of trajectories, our algorithm generates an improved policy by selecting actions using an extension of Max-Plus (or Variable Elimination) that guarantees safety criteria. Empirical evaluation on multi-agent SysAdmin and multi-UAV Delivery shows that the approach scales to very large domains for which state-of-the-art algorithms cannot work.

Live content is unavailable. Log in and register to view live content