Multi-Modal Reasoning for Agentic Intelligence

Workshop

Multi-Modal Reasoning for Agentic Intelligence

Zhenfei Yin, Naji Khosravan, Tao Ji, Yin Wang, Roozbeh Mottagi, Iro Armeni, Zhuqiang Lu, Annie S. Chen, Yufang Liu, Zixian Ma, Mahtab Bigverdi, Amita Kamath, Chen Feng, Lei Bai, Gordon Wetzstein, Philip Torr

301 A

Mon 20 Oct 11 a.m. PDT — 8 p.m. PDT

[ Abstract ]

[ Project Page ]

AI agents powered by Large Language Models (LLMs) have shown strong reasoning abilities across tasks like coding and research. With the rise of Multimodal Foundation Models (MFMs), agents can now integrate visual, textual, and auditory inputs for richer perception and decision-making. This workshop explores the development of Multimodal AI Agents across four categories: Digital, Virtual, Wearable, and Physical. We will discuss their applications in science, robotics, and human-computer interaction, as well as key challenges in cross-modal integration, real-time responsiveness, and interpretability. The goal is to advance robust, context-aware agents for complex, real-world environments.

Live content is unavailable. Log in and register to view live content