Poster

Understanding the Origins of Bias in Word Embeddings

Marc-Etienne Brunet · Colleen Alkalay-Houlihan · Ashton Anderson · Richard Zemel

Keywords: Computational Social Sciences Fairness Interpretability Natural Language Processing

2019 Poster

Abstract

Popular word embedding algorithms exhibit stereotypical biases, such as gender bias. The widespread use of these algorithms in machine learning systems can amplify stereotypes in important contexts. Although some methods have been developed to mitigate this problem, how word embedding biases arise during training is poorly understood. In this work we develop a technique to address this question. Given a word embedding, our method reveals how perturbing the training corpus would affect the resulting embedding bias. By tracing the origins of word embedding bias back to the original training documents, one can identify subsets of documents whose removal would most reduce bias. We demonstrate our methodology on Wikipedia and New York Times corpora, and find it to be very accurate.

Chat is not available.