Skip to yearly menu bar Skip to main content


Poster

Active Statistical Inference

Tijana Zrnic · Emmanuel J Candes


Abstract:

Inspired by the concept of active learning, we propose active inference---a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning model to identify which data points would be most beneficial to label, thus effectively utilizing the budget. It operates on a simple yet powerful intuition: prioritize the collection of labels for data points where the model exhibits uncertainty, and rely on the model's predictions where it is confident. Active inference constructs provably valid confidence intervals and hypothesis tests while leveraging any black-box machine learning model and handling any data distribution. Moreover, it achieves far smaller errors than existing baselines relying on i.i.d. data, enabling smaller confidence intervals and more powerful p-values. We evaluate active inference on datasets from survey research and proteomics.

Live content is unavailable. Log in and register to view live content