Workshop

Trustworthy Multi-modal Foundation Models and AI Agents (TiFA)

Zhenfei (Jeremy) Yin · Yawen Duan · Lijun Li · Jianfeng Chi · Yichi Zhang · Pavel Izmailov · Bo Li · Andy Zou · Yaodong Yang · Hang Su · Jing Shao · Yu Qiao · Jun Zhu · Xuanjing Huang · Wanli Ouyang · Dacheng Tao · Phil Torr

Project Page

Abstract

Advanced Multi-modal Foundation Models (MFMs) and AI Agents, equipped with diverse modalities [1, 2, 3, 4, 15] and an increasing number of available affordances [5, 6] (e.g., tool use, code interpreter, API access, etc.), have the potential to accelerate and amplify their predecessors’ impact on society [7].

MFM includes multi-modal large language models (MLLMs) and multi-modal generative models (MMGMs). MLLMs refer to LLM-based models with the ability to receive, reason, and output with information of multiple modalities, including but not limited to text, images, audio, and video. Examples include Llava [1], Reka [8], QwenVL [9], LAMM [36],and so on. MMGMs refer to a class of MFM models that can generate new content across multiple modalities, such as generating images from text descriptions or creating videos from audio and text inputs. Examples include Stable Diffusion [2], Sora [10], and Latte [11]. AI agents, or systems with higher degree of agenticness, refer to systems that could achieve complex goals in complex environments with limited direct supervision [12]. Understanding and preempting the vulnerabilities of these systems [13, 35] and their induced harms [14] becomes unprecedentedly crucial.

Building trustworthy MFMs and AI Agents transcends adversarial robustness of such models, but also emphasizes the importance of proactive risk assessment, mitigation, safeguards, and the establishment of comprehensive safety mechanisms throughout the lifecycle of the systems’ development and deployment [16, 17]. This approach demands a blend of technical and socio-technical strategies, incorporating AI governance and regulatory insights to build trustworthy MFMs and AI Agents.

Topics include but are not limited to: - Adversarial attack and defense, poisoning, hijacking and security [18, 13, 19, 20, 21] - Robustness to spurious correlations and uncertainty estimation - Technical approaches to privacy, fairness, accountability and regulation [12, 22, 28] - Truthfulness, factuality, honesty and sycophancy [23, 24] - Transparency, interpretability and monitoring [25, 26] - Identifiers of AI-generated material, such as watermarking [27] - Technical alignment / control , such as scalable overslight [29], representation control [26] and machine unlearning [30] - Model auditing, red-teaming and safety evaluation benchmarks [31, 32, 33, 16] - Measures against malicious model fine-tuning [34] - Novel safety challenges with the introduction of new modalities

Video

Chat is not available.

Schedule

Timezone: America/Los_Angeles

12:30 AM

Opening Remark

Jing Shao

Video

12:40 AM

A Data-Centric View on Reliable Generalization

Ludwig Schmidt

Video

1:10 AM

Robust Alignment and Control with Representation Engineering

Matt Fredrikson

Video

1:40 AM

Coffee Break

1:50 AM

Security and Safety of AI Agents

Daniel Paleka · Matt Fredrikson · Alan Chan · Ivan Evtimov · Kai Greshake · Tomasz Korbak

Video

2:40 AM

The Safety in Large Language Models

Yisen Wang

Video

3:00 AM

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Video

3:10 AM

Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques

Video

3:20 AM

Lunch Break

4:30 AM

Agent Governance

Alan Chan

Video

5:00 AM

UK AI Safety Institute: Overview & Agents Evals

Herbie Bradley

Video

5:30 AM

TiFA Challenge Takeaways

Lijun Li · Bowen DONG

Video

5:50 AM

Break

6:10 AM

Paper Lightning Talks

Dehan Kong · Zhuo ZHI · Shiyang Lai · John Heibel · Orr Paradise · Marvin Li · Jeffrey Wang

Video

6:50 AM

Poster Session

Games for AI-Control: Models of Safety Evaluations of AI Deployment Protocols

Charlie Griffin · Buck Shlegeris · Alessandro Abate