Skip to yearly menu bar Skip to main content


Poster

OAK: Enriching Document Representations using Auxiliary Knowledge for Extreme Classification

Shikhar Mohan · Deepak Saini · Anshul Mittal · Sayak Ray Chowdhury · Bhawna Paliwal · Jian Jiao · Manish Gupta · Manik Varma


Abstract:

The objective in eXtreme Multilabel Classification (XMC) is to find relevant labels for a document from an exceptionally large label space. Most XMC application scenarios have rich auxiliary data associated with the input documents, e.g., frequently clicked webpages for search queries in sponsored search. Unfortunately, most of the existing XMC methods do not use any auxiliary data. In this paper, we propose a novel framework, Online Auxiliary Knowledge (OAK), which harnesses auxiliary information linked to the document to improve XMC accuracy. OAK stores information learnt from the auxiliary data in a knowledge bank and during a forward pass, retrieves relevant auxiliary knowledge embeddings for a given document. An enriched embedding is obtained by fusing these auxiliary knowledge embeddings with the document’s embedding, thereby enabling much more precise candidate label selection and final classification. OAK training involves three stages. Stage 1 trains a linker module to link documents to relevant auxiliary data points. Stage 2 learns an embedding for documents enriched using linked auxiliary information. Stage 3 uses the enriched document embeddings to learn the final classifier. OAK outperforms current state-of-the-art XMC methods by up to ~5% on academic datasets, by ~3% on a auxiliary data-augmented variant of LF-ORCAS-800K dataset in Precision@1. OAK also demonstrates statistically significant improvements in sponsored search metrics when deployed on a large scale search engine.

Live content is unavailable. Log in and register to view live content