Skip to yearly menu bar Skip to main content


Poster

Position Paper: A Safe Harbor for AI Evaluation and Red Teaming

Shayne Longpre · Sayash Kapoor · Kevin Klyman · Ashwin Ramaswami · Rishi Bommasani · Borhane Blili-Hamelin · Yangsibo Huang · Aviya Skowron · Zheng Xin Yong · Suhas Kotha · Yi Zeng · Weiyan Shi · Xianjun Yang · Reid Southen · Alex Robey · Patrick Chao · Diyi Yang · Ruoxi Jia · Daniel Kang · Alex Pentland · Arvind Narayanan · Percy Liang · Peter Henderson


Abstract:

Independent evaluation and red teaming are critical for identifying the growing risks posed by generative AI systems. However, the terms of service and enforcement strategies used by prominent AI companies to deter model misuse have disincentives on good faith safety research. This causes researchers to fear that conducting such research or releasing their findings will result in costly account suspensions or legal reprisal. Although some companies offer researcher access programs, their community representation is limited, they receive inadequate funding, and they lack independence from corporate incentives. We propose that major AI developers commit to providing a legal and technical safe harbor, indemnifying much-needed public interest research and protecting it from the threat of account suspensions. These proposals emerged from our collective experience conducting safety, privacy, and security research on generative AI systems, where norms and incentives could be better aligned with the public interest without exacerbating model misuse. We believe these commitments are a fundamental and necessary step towards more inclusive and unimpeded community efforts to tackle the risks of generative AI.

Live content is unavailable. Log in and register to view live content