Workshop
Multi-Modal Reasoning for Agentic Intelligence
Zhenfei Yin, Naji Khosravan, Tao Ji, Yin Wang, Roozbeh Mottagi, Iro Armeni, Zhuqiang Lu, Annie S. Chen, Yufang Liu, Zixian Ma, Mahtab Bigverdi, Amita Kamath, Chen Feng, Lei Bai, Gordon Wetzstein, Philip Torr
Mon 20 Oct, 11 a.m. PDT
AI agents powered by Large Language Models (LLMs) have shown strong reasoning abilities across tasks like coding and research. With the rise of Multimodal Foundation Models (MFMs), agents can now integrate visual, textual, and auditory inputs for richer perception and decision-making. This workshop explores the development of Multimodal AI Agents across four categories: Digital, Virtual, Wearable, and Physical. We will discuss their applications in science, robotics, and human-computer interaction, as well as key challenges in cross-modal integration, real-time responsiveness, and interpretability. The goal is to advance robust, context-aware agents for complex, real-world environments.
Live content is unavailable. Log in and register to view live content