Poster Exhibit Hall I #331

The Source Image is the Best Attention for Infrared and Visible Image Fusion

Song Wang ⋅ Xie Han ⋅ Liqun Kuang ⋅ Boying Wang ⋅ Zhongyu Chen ⋅ Zherui Qiao ⋅ Fan Yang ⋅ Xiaoxia Liu ⋅ Bingyu Zhang ⋅ Zhixun Wang

2025 Poster

Project Page [ Poster]

Abstract

Infrared and visible image fusion (IVF) aims to generate informative fused images by combining the merits of different modalities. In this paper, we uncover the inherent "attention properties" of infrared images, which directly arise from their physical characteristics and can be linked to attention mechanisms naturally, as observed in the gradient-weighted class activation mapping (Grad-CAM) visualization results of image classification models. To incorporate this property into IVF for better fusion, we propose the source infrared cross attention (I-SCA). Furthermore, we extend this discovery to visible images and introduce the source visible cross attention (V-SCA). The joint use of I-SCA and V-SCA addresses longstanding issues in image fusion, such as insufficient and incomplete multimodal feature interaction and fusion. Moreover, to solve the problem of mismatched channel numbers between the source images and intermediate features, which makes it impossible to apply the attention equation directly, and to minimize the domain gap between their respective feature spaces, an adaptive channel boosting and intelligent space mapping module (CBSM) is introduced. Specifically, we treat the CBSM-processed raw image as the query, while the intermediate features of another modality are treated as keys and values in I-SCA and V-SCA. Unlike attention mechanisms that divide images into patches or limit computations to local windows, we achieve smoother and more robust IVF through true global modeling across the entire image space in the source image attention, with linear complexity. Comparison with current SOTA methods on three popular public datasets confirms the superiority of our method.

Chat is not available.