Skip to yearly menu bar Skip to main content


Poster

Position Paper: Data-driven Discovery with Large Generative Models

Bodhisattwa Prasad Majumder · Harshit Surana · Dhruv Agarwal · Sanchaita Hazra · Ashish Sabharwal · Peter Clark


Abstract:

With the accumulation of data at unprecedented rates, its potential today to fuel scientific discovery grows exponentially. This position paper urges the Machine Learning (ML) community to exploit the capabilities of large generative models (LGMs) to develop automated systems for end-to-end data-driven discovery—a paradigm encompassing the search and verification of hypotheses purely from a set of provided datasets, eliminating the need for additional data collection or physical experiments. We first outline several desiderata for an ideal discovery system. Through THOTH, a proof-of-concept utilizing GPT-4, we then demonstrate how LGMs fulfill several of these—a feat previously unattainable while highlighting important limitations in the current system, which open up opportunities for novel research within the ML community. We contend that achieving safe, reliable, and robust end-to-end discovery systems solely through the current capabilities of LGMs is challenging. We instead advocate for fail-proof tool integration along with active user moderation through feedback mechanisms in order to foster data-driven scientific discoveries with efficiency and reproducibility.

Live content is unavailable. Log in and register to view live content