Poster
Multimodal Large Language Model-Guided ISP Hyperparameter Optimization with Dynamic Preference Learning
Xinyu Sun · Zhikun Zhao · congyan lang · Bing Li · Juan Wang
The image signal processing (ISP) pipeline is responsible for converting the RAW images collected from the sensor into high-quality RGB images. It contains a series of image processing modules and associated ISP hyperparameters. Recent learning-based approaches aim to automate ISP hyperparameter optimization using solely image data. However, their unimodal nature limits their ability to capture richer contextual information, reducing robustness and adaptability across diverse application scenarios. To address this limitation, we propose a Multimodal Large Language Model (MLLM)-guided ISP hyperparameter optimization framework, which integrates textual insights generated by MLLMs into the optimization process. By incorporating both high-level semantic cues and low-level image quality descriptors, our method enhances contextual understanding and task adaptability. Additionally, we introduce a Dynamic Pair Generation (DPG) refinement strategy based on Direct Preference Optimization (DPO), facilitating efficient preference alignment without the need for extensive human-labeled data. This novel framework not only improves the directional consistency of optimization but also significantly reduces the computational and data preparation overhead. We validate our proposed methods on both high-level and low-level vision tasks, demonstrating superior performance compared to existing methods.
Live content is unavailable. Log in and register to view live content