Workshop Ballroom B

Multimodal reasoning and slow thinking in large model era: towards system2 and beyond

Chen Cheng, Chen Change Loy, David Clifton, Luc Van Gool, Shengwu Xiong, Peng Xu, Jiajun Zhang

Project Page

Abstract

This workshop aims to bridge the gap between computer vision and large language/reasoning models, focusing on complex tasks requiring advanced reasoning capabilities. We will explore how models can comprehend complex relationships through slow-thinking approaches like Neuro-Symbolic reasoning, Chain-of-Thought, and Multi-step Reasoning, pushing beyond traditional fixed tasks to understand object interactions within complex scenes. The goal is to bring together perspectives from computer vision, multimodal learning, and large language models to address outstanding challenges in multimodal reasoning and slow thinking in the context of large reasoning models, fostering more flexible and robust understanding in AI systems.

Video

Chat is not available.