Skip to yearly menu bar Skip to main content


Workshop

Multi-Modal Reasoning for Agentic Intelligence

Zhenfei Yin, Naji Khosravan, Tao Ji, Yin Wang, Roozbeh Mottagi, Iro Armeni, Zhuqiang Lu, Annie S. Chen, Yufang Liu, Zixian Ma, Mahtab Bigverdi, Amita Kamath, Chen Feng, Lei Bai, Gordon Wetzstein, Philip Torr

Mon 20 Oct, 11 a.m. PDT

AI agents powered by Large Language Models (LLMs) have shown strong reasoning abilities across tasks like coding and research. With the rise of Multimodal Foundation Models (MFMs), agents can now integrate visual, textual, and auditory inputs for richer perception and decision-making. This workshop explores the development of Multimodal AI Agents across four categories: Digital, Virtual, Wearable, and Physical. We will discuss their applications in science, robotics, and human-computer interaction, as well as key challenges in cross-modal integration, real-time responsiveness, and interpretability. The goal is to advance robust, context-aware agents for complex, real-world environments.

Live content is unavailable. Log in and register to view live content