Poster
MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI
Huanjin Yao · Jiaxing Huang · Yawen Qiu · Michael K. Chen · Wenzheng Liu · Wei Zhang · wenjie zeng · Xikun ZHANG · Jingyi Zhang · YuXin Song · Wenhao Wu · Dacheng Tao
Reasoning plays a crucial role in advancing Multimodal Large Language Models (MLLMs) toward Artificial General Intelligence.However, existing MLLM benchmarks often fall short in precisely and comprehensively evaluating long-chain reasoning abilities from three key aspects: (1) lack of difficulty and diversity, (2) susceptibility to guessability and memorization, (3) inadequate assessment of intermediate reasoning steps.To fill this gap, we introduce MMReason, a new benchmark designed to precisely and comprehensively evaluate MLLM long-chain reasoning capability with diverse, open-ended, challenging questions.First, we curate challenging questions requiring multi-step reasoning from various fields (i.e., 6 disciplines) and multiple difficulty levels (i.e., from pre-university to university, and from foundational to competition tiers).Second, these questions are reformulated into an open-ended format and filtered using a multi-model voting technique to eliminate shortcut cases related to guessing and memorization, ensuring robust reasoning evaluations.Third, we annotate the questions with detailed step-by-step solutions, and design a reference-based ternary scoring mechanism to reliably assess intermediate reasoning steps.With MMReason, we benchmark popular leading MLLMs and provide an in-depth analysis of their reasoning capabilities.We hope MMReason will serve as a valuable resource for advancing MLLM reasoning research.
Live content is unavailable. Log in and register to view live content