Skip to yearly menu bar Skip to main content


Poster

OmniDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment

Tongfan Guan · Jiaxin Guo · Chen Wang · Yun-Hui Liu


Abstract: Monocular and stereo depth estimation offer complementary strengths: monocular methods capture rich contextual priors but lack geometric precision, while stereo approaches leverage epipolar geometry but struggle with ambiguities such as reflective or textureless surfaces.Despite their synergies, these paradigms remain largely disjoint in practice.We introduce OmniDepth, a unified framework that bridges both through iterative bidirectional alignment of their latent representations.At its core, a novel cross-attentive alignment mechanism dynamically synchronizes monocular contextual cues with disparity hypothesis representations during stereo reasoning.This mutual alignment resolves stereo ambiguities (e.g., specular surfaces) by injecting monocular structure priors while refining monocular depth with stereo geometry.Extensive experiments demonstrate state-of-the-art results: OmniDepth reduces zero-shot generalization error by $\!>\!40\%$ on Middlebury and ETH3D compared to leading stereo methods, while addressing longstanding failure cases on transparent and reflective surfaces.By harmonizing multi-view geometry with monocular context, OmniDepth advances robust 3D perception that transcends modality-specific limitations.Code and models will be released.

Live content is unavailable. Log in and register to view live content