Skip to yearly menu bar Skip to main content


Poster

Multimodal Large Language Model-Guided ISP Hyperparameter Optimization with Dynamic Preference Learning

Xinyu Sun · Zhikun Zhao · congyan lang · Bing Li · Juan Wang


Abstract:

The image signal processing (ISP) pipeline is responsible for converting the RAW images collected from the sensor into high-quality RGB images. It contains a series of image processing modules and associated ISP hyperparameters. Recent learning-based approaches aim to automate ISP hyperparameter optimization using solely image data. However, their unimodal nature limits their ability to capture richer contextual information, reducing robustness and adaptability across diverse application scenarios. To address this limitation, we propose a Multimodal Large Language Model (MLLM)-guided ISP hyperparameter optimization framework, which integrates textual insights generated by MLLMs into the optimization process. By incorporating both high-level semantic cues and low-level image quality descriptors, our method enhances contextual understanding and task adaptability. Additionally, we introduce a Dynamic Pair Generation (DPG) refinement strategy based on Direct Preference Optimization (DPO), facilitating efficient preference alignment without the need for extensive human-labeled data. This novel framework not only improves the directional consistency of optimization but also significantly reduces the computational and data preparation overhead. We validate our proposed methods on both high-level and low-level vision tasks, demonstrating superior performance compared to existing methods.

Live content is unavailable. Log in and register to view live content