ICML Poster R2E: Turning any Github Repository into Programming Agent Test Environment

Poster

R2E: Turning any Github Repository into Programming Agent Test Environment

Naman Jain · Manish Shetty Molahalli · Tianjun Zhang · Shangdian Han · Koushik Sen · Ion Stoica

[ Abstract ]

Abstract:

While Large Language Models' coding capabilities have advanced rapidly, corresponding evaluation benchmarks on real-world programming setups are yet to catch up. Building a scalable and interactive testbed for evaluating general-purpose AI coding agents for real-world code has been challenging. In this paper, we present Repository to Environment (R2E), a framework that can turn any GitHub repository into a test environment to evaluate the performance of code-generating systems, both static and interactive. We instantiate our framework to build the first large-scale benchmark, R2E-Eval, for building realistic environments for AI coding assistants. Our results demonstrate that even when SOTA models cannot generate correct solutions with advanced prompting techniques, they can effectively use environment feedback highlighting the need to move from static functional coding to interactive programming paradigm.

Live content is unavailable. Log in and register to view live content