Skip to yearly menu bar Skip to main content


Poster

Understanding Finetuning for Factual Knowledge Extraction

Gaurav Ghosal · Tatsunori Hashimoto · Aditi Raghunathan


Abstract:

Language models have demonstrated promising abilities in absorbing factual information from large-scale unstructured data and applying it on downstream tasks. However, these factual abilities are often unreliable and language models have been shown to generate false information, even when they can otherwise be shown to contain the true knowledge. In this work, we investigate the impact of supervised finetuning data on the downstream factuality of the model. In simulation and controlled settings, we make the surprising observation that fine-tuning on more popular knowledge improves model factuality, while fine-tuning on less popular knowledge it already knows can worsen downstream factuality. We investigate this phenomenon theoretically and in controlled settings, finding that training on less popular knowledge can induce the model to learn shortcuts in favor of utilizing knowledge stored from pretraining. Finally, we verify that the trends we find hold on real language models (Llama-7B and Mistral-7B) and demonstrate that training on the most popular knowledge performs comparably to or better than using additional, less popular data.

Live content is unavailable. Log in and register to view live content