Skip to yearly menu bar Skip to main content


Poster

Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo

Stephen Zhao · Rob Brekelmans · Alireza Makhzani · Roger Grosse


Abstract:

Numerous capability and safety techniques of Large Language Models (LLMs) --- such as RLHF, automated red-teaming, prompt engineering, and infilling --- can be cast as sampling from an unnormalized target distribution. In this work, we argue that these problems can be viewed from a probabilistic inference perspective, allowing us to leverage the rich toolkit of sequential Monte Carlo (SMC) to solve them. In particular, we propose to perform inference using twisted SMC --- a variant of SMC that uses a set of learned twist functions which estimates the future reward to perform resampling based on 'promising' partial sequences. We propose a novel contrastive learning method for learning twist functions, and further establish connections with the rich literature of soft reinforcement learning. Finally, we discuss a complementary application of twisted SMC in evaluating fine-tuning or decoding methods such as PPO, by deriving novel bidirectional SMC sandwich bounds on the log partition function which allow for estimation of KL divergences. We showcase the effectiveness of twisted sampling in SMC for generating undesirable outputs from the pretrained model and evaluating inference in toxicity, sentiment, and infilling settings.

Live content is unavailable. Log in and register to view live content