Skip to yearly menu bar Skip to main content


Poster

Automated Statistical Model Discovery with Language Models

Michael Li · Emily Fox · Noah Goodman


Abstract:

Modeling is a core component of scientific discovery.However, model discovery is challenging because it involves searching over a vast space of models subject to domain-specific modeling constraints (e.g., this model should be physical).Efficiently searching over this space requires expertise in both modeling and the specific problem domain.Motivated by large language models’ (LMs) programming and reasoning capabilities, as well as their broad domain knowledge, we introduce a method for language model driven automated statistical model discovery.We focus on probabilistic modeling and cast our automated procedure within the principled framework of Box’s Loop: the LM iterates between proposing statistical models represented as probabilistic programs, acting as a modeler, and critiquing these models, acting as a domain expert.By leveraging LMs, we avoid having to define a domain-specific language (DSL) of models and specify a handcrafted search procedure, key restrictions of previous systems.We evaluate our approach in three settings common in probabilistic modeling: searching within a restricted space of models, searching over an open-ended space of models, and improving expert models given some soft modeling constraints.Our method is effective in all three settings, and can identify models that perform favorably against strong baselines, such as human expert written probabilistic programs.In ablations, we also characterize the role of domain knowledge in guiding LM search.Our approach highlights the promise of LM driven statistical model discovery.

Live content is unavailable. Log in and register to view live content