Poster
InfoBridge: Balanced Multimodal Integration through Conditional Dependency Modeling
Chenxin Li · Yifan Liu · Panwang Pan · Hengyu Liu · Xinyu Liu · Wuyang Li · Cheng Wang · Weihao Yu · Yiyang LIN · Yixuan Yuan
Developing systems that can interpret diverse real-world signals remains a fundamental challenge in multimodal learning. Current approaches to multimodal fusion face significant obstacles stemming from inherent modal heterogeneity. While existing methods attempt to enhance fusion through cross-modal alignment or interaction mechanisms, they often struggle to balance effective integration with preserving modality-specific information, and frequently neglect crucial contextual nuances unique to each modality. We introduce ModBridge, a novel framework grounded in conditional information maximization principles that addresses these limitations. Our approach reframes multimodal fusion through two key innovations: (1) we formulate fusion as a conditional mutual information optimization problem with an integrated protective margin that simultaneously encourages cross-modal information sharing while safeguarding against over-fusion that could eliminate unique modal characteristics; and (2) we enable fine-grained contextual fusion by leveraging modality-specific conditions (such as audio event detection signals) to guide the integration process. Comprehensive evaluations across multiple benchmarks demonstrate that ModBridge consistently outperforms state-of-the-art multimodal architectures, establishing a more principled and effective approach to multimodal learning that better captures complementary information across diverse input signals.
Live content is unavailable. Log in and register to view live content