ICCV Poster INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

Poster

INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

Chenwei Lin · Hanjia Lyu · Xian Xu · Jiebo Luo

Exhibit Hall I #838

[ Abstract ] [ Project Page ]

Tue 21 Oct 6:15 p.m. PDT — 8:15 p.m. PDT

Abstract:

Large Vision-Language Models (LVLMs) have demonstrated outstanding performance in various general multimodal applications and have shown promising potential in specialized domains. However, the application potential of LVLMs in the insurance domain—characterized by rich application scenarios and abundant multimodal data—has not been effectively explored. There is no systematic review of multimodal tasks in the insurance domain, nor a benchmark specifically designed to evaluate the capabilities of LVLMs in insurance. This gap hinders the development of LVLMs within the insurance domain. In this paper, we systematically review and distill multimodal tasks for 4 representative types of insurance: auto insurance, property insurance, health insurance, and agricultural insurance. We propose INS-MMBench, the first hierarchical LVLMs benchmark tailored for the insurance domain. INS-MMBench encompasses 22 fundamental tasks, 12 meta-tasks and 5 scenario tasks—enabling a comprehensive and progressive assessment from basic tasks to real-world insurance scenarios. Furthermore, we evaluate multiple representative LVLMs, including closed-source models such as GPT-4o and open-source models like LLaVA. Our evaluation not only validates the effectiveness of our benchmark but also provides an in-depth performance analysis of current LVLMs on various multimodal tasks in the insurance domain. We hope that INS-MMBench will facilitate the further application of LVLMs in the insurance domain and inspire interdisciplinary development. A sample dataset and evaluation code are available at https://anonymous.4open.science/r/INS-MMBench-Anonymize. The full dataset will be released after the review process.

Live content is unavailable. Log in and register to view live content