ICML Poster Robust and Fine-tuning-free Instance Attribution for Interpretable NLP

Poster

Robust and Fine-tuning-free Instance Attribution for Interpretable NLP

Jingtan Wang · Xiaoqiang Lin · Rui Qiao · Chuan-Sheng Foo · Bryan Kian Hsiang Low

[ Abstract ]

Abstract:

The ever-growing complexity of foundation models necessitates the need for interpretability. Instance attribution, one interpretability approach, attributes the model prediction to each training example by an instance score. However, the robustness of instance scores, specifically towards dataset resampling, has been overlooked. To bridge this gap, we propose a notion of robustness on the sign of the instance score. We theoretically and empirically demonstrate that the popular leave-one-out-based methods lack robustness, while the Shapley value behaves significantly better, but at a higher computational cost. Accordingly, we introduce an efficient fine-tuning-free approximation of the Shapley value (FreeShap) for instance attribution based on the neural tangent kernel. We empirically demonstrate that our FreeShap outperforms other methods for instance attribution in natural language processing.

Live content is unavailable. Log in and register to view live content