Skip to yearly menu bar Skip to main content


Poster

Interpreting Natural Language Generation via Optimal Transport

Xuhong Li · Jiamin Chen · Yekun Chai · Haoyi Xiong


Abstract:

While large language models (LLMs) surge with the rise of generative AI, algorithms to explain LLMs highly desire. Existing feature attribution methods adequate for discriminative language models like BERT often fail to deliver faithful explanations for LLMs, primarily due to two issues: (1) for every specific prediction, the LLM outputs a probability distribution over the vocabulary--a large number of tokens with unequal semantic distance, and (2) as an autoregressive language model, the LLM handles input tokens while generating a sequence of probability distributions of various tokens. To address above two challenges, this work proposes LLMExp that leverages Optimal Transport (OT) to measure the distributional change of all possible generated sequences upon the absence of every input token, while taking into account the tokens' similarity, so as to faithfully estimate feature attribution for LLMs. We carried out extensive experiments on top of Llama families and their fine-tuned derivatives across various scales to validate the effectiveness of LLMExp for estimating the input attributions. The results show that LLMExp outperforms existing solutions on a number of faithfulness metrics under fair comparison settings.

Live content is unavailable. Log in and register to view live content