Skip to yearly menu bar Skip to main content


Poster

Balancing Feature Similarity and Label Variability for Optimal Size-Aware Subset Selection

Abhinab Acharya · Dayou Yu · Qi Yu · Xumin Liu


Abstract:

Subset or core-set selection offers a data-efficient way for training deep learning models by identifying important data samples so that the model can be trained using the selected subset with similar performance as trained on the full set. One-shot subset selection poses additional challenges as subset selection is only performed once and full set data become unavailable after the selection. However, most existing methods tend to choose either diverse or difficult data samples, leading to the misalignment with the optimal selection goal which faithfully represents the joint data distribution that is comprised of both feature and label information. The selection is also performed independently from the subset size, which plays an essential role in choosing what types of samples. To address this critical gap, we propose to conduct Feature similarity and Label variability Balanced One-shot Subset Selection (BOSS), aiming to construct an optimal size-aware subset for data-efficient deep learning. We show that a novel balanced core-set loss bound theoretically justifies the need to simultaneously consider both diversity and difficulty to form an optimal subset. It also reveals how the subset size influences the bound. Since directly minimizing the bound is infeasible, we connect the inaccessible bound to a practical surrogate target which is tailored to subset sizes and varying levels of overall difficulty. Building on this connection, we design a novel Beta-scoring importance function to delicately control the optimal balance of diversity and difficulty. A comprehensive experimental study is conducted on both synthetic and real datasets to justify the important theoretical properties and demonstrate the superior performance of BOSS as compared with the competitive baselines.

Live content is unavailable. Log in and register to view live content