ICML 2024
Skip to yearly menu bar Skip to main content


Workshop

Data-centric Machine Learning Research (DMLR): Datasets for Foundation Models

Adam Mahdi · Ludwig Schmidt · Alexandros Dimakis · Rotem Dror · Georgia Gkioxari · Sang Truong · Lilith Bat-Leah · Fatimah Alzamzami · Georgios Smyrnis · Thao Nguyen · Nezihe Merve Gürel · Paolo Climaco · Luis Oala · Hailey Schoelkopf · Andrew M. Bean · Berivan Isik · Vaishaal Shankar · Mayee Chen · Achal Dave

Straus 3
[ Abstract ] Workshop Website
Sat 27 Jul, midnight PDT

This workshop addresses the growing significance of preparing high quality datasets for the development of large-scale foundation models. With recent advancements highlighting the key role of dataset size, quality, diversity, and provenance in model performance, this workshop considers the strategies employed for enhancing data quality, including filtering, augmentation, and relabeling. The workshop draws upon the increasing interest in data-centric research. It seeks to advance understanding and methodologies for dataset composition and curation, ultimately fostering the development of more robust models capable of addressing diverse challenges across multiple domains and that can benefit the public.

Live content is unavailable. Log in and register to view live content