ICML Poster Can AI Assistants Know What They Don't Know?

Poster

Can AI Assistants Know What They Don't Know?

Qinyuan Cheng · Tianxiang Sun · Xiangyang Liu · Wenwei Zhang · Zhangyue Yin · Shimin Li · Linyang Li · Zhengfu He · Kai Chen · Xipeng Qiu

[ Abstract ]

Abstract:

Recently, AI assistants powered by Large Language Models (LLMs) have demonstrated impressive performance in various tasks, including dialogue, solving mathematical problems, coding, and tool utilization. Despite their extensive world knowledge, LLMs still commit factual errors in knowledge-intensive tasks such as open-domain question answering.These untruthful responses from AI assistants can pose significant risks in practical applications.We contend that an AI assistant's ability to recognize and admit its knowledge limitations by refusing to answer questions beyond its knowledge is a critical strategy for reducing factual errors and enhancing truthfulness.Therefore, in this paper, we ask the question "Can AI assistants know what they don't know and express them through natural language?"To investigate this, we construct a model-specific "I don't know" (Idk) dataset for an AI assistant, comprising questions it knows and unknows, derived from existing open-domain question answering datasets.Then we align the assistant with its corresponding Idk dataset and observe whether it can refuse to answer its unknown questions after alignment.Our experimental results indicate that, after alignment with the Idk dataset, the assistant is more capable of declining to answer questions outside its knowledge scope.Furthermore, the accuracy of the responses to questions it does attempt to answer is notably higher than before the alignment.

Live content is unavailable. Log in and register to view live content