Poster
Unbiased Missing-modality Multimodal Learning
Raiting Dai · Chenxi Li · Yandong Yan · Lisi Mo · Ke Qin · Tao He
Previous multimodal learning models for missing modalities predominantly employ diffusion models to recover absent data conditioned on the available modalities. However, these approaches often overlook a critical issue: modality generation bias. In other words, while some modalities may be generated with high quality, others—such as video—may prove challenging to synthesize effectively. We argue that this limitation is primarily due to the inherent modality gap, ultimately resulting in imbalanced training. To overcome this challenge, we introduce a novel Multi-stage Duplex Diffusion Network (MD^2N) designed to achieve unbiased missing-modality recovery. The key idea of our approach is the development of a modality transfer module within the recovery process, which facilitates smooth cross-modality generation. This module is trained using duplex diffusion models, enabling the available and missing modalities to generate each other in an intersecting manner through three distinct stages: global structure generation, modality transfer, and local cross-modal refinement. At training, the generation of the available and missing data mutually influences and finally achieves a generation balance state. Experimental results demonstrate that our proposed method significantly outperforms current state-of-the-art techniques, achieving up to a 4% improvement over IMDer on the CMU-MOSEI dataset.
Live content is unavailable. Log in and register to view live content