ICML 2024
Skip to yearly menu bar Skip to main content


Workshop

Multi-modal Foundation Model meets Embodied AI (MFM-EAI)

Zhenfei (Jeremy) Yin · Mahi Shafiullah · Zhenhua Xu · Quan Vuong · Jing Shao · Lu Sheng · Takayuki Osa · Hengshuang Zhao · Mohamed Elhoseiny · Xihui Liu · Tatsuya Harada · Cewu Lu · Wanli Ouyang · Pete Florence · Yu Qiao · Dacheng Tao · Phil Torr

Lehar 4
[ Abstract ]
Fri 26 Jul, midnight PDT

Multi-modal Foundation Model meets Embodied AI (MFM-EAI)In recent years, Multi-modal Foundation Models (MFM) such as CLIP, ImageBind, DALLĀ·E 3, GPT-4V, and Gemini have emerged as one of the most captivating and rapidly advancing areas in AI, drawing significant attention and progressing swiftly. The open-source community for MFM has also seen vigorous growth, with the emergence of models and algorithms like LLaVA, LAMM, Stable Diffusion, and OpenFlamingo. These MFMs are now actively exploring ultimate application scenarios beyond traditional computer vision tasks.Recent studies have unveiled the immense potential these models hold in empowering embodied AI agents, marking the intersection of these fields with a multitude of open questions and unexplored territories. This workshop, MFM-EAI, is dedicated to exploring these critical challenges:- How can we train and evaluate MFM in open-ended environments?- What constitutes an effective system architecture for MFM-based Embodied AI Agents?- And importantly, how can MFM augment the perceptual and decision-making capabilities of these agents, balancing their high-level decision-making prowess with the nuanced requirements of low-level control in embodied systems?Topics include but are not limited to:- Training and evaluation of MFM in open-ended scenarios- Data collection for training Embodied AI Agents and corresponding MFM- Framework design for MFM-powered embodied agents- Decision-making in Embodied Agents empowered by MFM- Low-level control in Embodied Agents empowered by MFM- Evaluation and simulation of Embodied Agents- Limitations of MFM in empowering Embodied AI

Live content is unavailable. Log in and register to view live content