Skip to yearly menu bar Skip to main content


Poster

What is Dataset Distillation Learning?

William Yang · Ye Zhu · Zhiwei Deng · Olga Russakovsky


Abstract:

Dataset distillation has emerged as a strategy to overcome the hurdles associated with large datasets by learning a compact set of synthetic data that retains essential information from the original dataset. While distilled data can be used to train high performing models, little is understood how information is stored. In this study, we posit and answer three questions: is dataset distillation more analogous to the compression of model parameters or data statistics? how does dataset distillation improve expressiveness compared to classical compression techniques? do distilled data points individually carry meaningful information? We reveal that distilled data cannot be simply characterized as either model or data compression. Additionally, the distillation process works by compressing the early dynamics of real models. Finally, we provide an interpretable framework for analyzing distilled data and uncover the fact that individual distilled data points do contain meaningful semantic information. This investigation sheds light on the intricate nature of distilled data, providing a better understanding on how this data can be effectively utilized.

Live content is unavailable. Log in and register to view live content