Skip to yearly menu bar Skip to main content


Poster

Position Paper: A Roadmap to Pluralistic Alignment

Taylor Sorensen · Jared Moore · Jillian Fisher · Mitchell Gordon · Niloofar Mireshghallah · Christopher Rytting · Andre Ye · Liwei Jiang · Ximing Lu · Nouha Dziri · Tim Althoff · Yejin Choi


Abstract:

With increased power and prevalence of AI systems, it is ever more critical that AI systems are designed to serve all, i.e., people with diverse values and perspectives. However, aligning models to serve pluralistic human values remains an open research question. In this position piece, we propose a roadmap to pluralistic alignment, specifically using language models as a test bed. We identify and formalize three possible ways to define and operationalize pluralism in AI systems: 1) Overton pluralistic models presents a spectrum of reasonable responses; 2) Steerably pluralistic models steer to reflect certain perspectives; and 3) Distributionally pluralistic models are well-calibrated to a given population in distribution. We also propose and formalize three possible classes of pluralistic benchmarks: 1) Multi-objective benchmarks, 2) Trade-off steer able benchmarks, which incentivize models to steer to arbitrary trade-offs, and 3) Jury-pluralistic benchmarks which explicitly model diverse human ratings. We use this framework to argue that current alignment techniques may be fundamentally limited for pluralistic AI; indeed, we highlight empirical evidence, both from our own experiments and from other work, that standard alignment procedures reduce distributional pluralism in models, motivating the need for further research on pluralistic alignment.

Live content is unavailable. Log in and register to view live content