Skip to yearly menu bar Skip to main content


Poster

On Stronger Computational Separations Between Multimodal and Unimodal Machine Learning

Ari Karchmer


Abstract:

Recently, multimodal machine learning has enjoyed huge empirical success (e.g. GPT-4). Motivated by developing theoretical justification for the empirical success, a recent pair of works by Lu (NeurIPS '23, ALT '24) introduces a theory of multimodal learning, and considers possible separations between theoretical models of multimodal and unimodal learning. In particular, Lu (ALT '24) shows a computational separation, which is relevant to worst-case instances of the learning task. In this paper, we give a stronger, average-case, separation, where for "typical" instances of the learning task, unimodal learning is computationally hard, but multimodal learning is easy. We then question how "natural" the average-case separation is. Would it be encountered in practice? To this end, we prove that under natural conditions, any computational separation between average-case unimodal and multimodal learning tasks implies a generic cryptographic key agreement protocol. We suggest to interpret this as evidence that computational advantages of multimodal learning may arise infrequently in practice, since they exist only for the "pathological" case of inherently cryptographic distributions. However, this does not apply to statistical advantages.

Live content is unavailable. Log in and register to view live content