Skip to yearly menu bar Skip to main content


Poster

StableDepth: Scene-Consistent and Scale-Invariant Monocular Depth

Zheng Zhang · Lihe Yang · Tianyu Yang · Chaohui Yu · Xiaoyang Guo · Yixing Lao · Hengshuang Zhao


Abstract: Recent advances in monocular depth estimation have significantly improved its robustness and accuracy. Despite these improvements, relative depth models, which offer strong generalization capability, fail to provide real-world depth measurements. Notably, these models exhibit severe flickering and 3D inconsistency when applied to video data, limiting their application for 3D reconstruction. To address these challenges, we introduce StableDepth, a scene-consistent and scale-invariant depth estimation method that achieves stable predictions with scene-level 3D consistency. We propose a dual decoder structure to learn smooth depth supervised by large-scale unlabeled video data. Our approach not only enhances the generalization capability but also reduces flickering during video depth estimation. Leveraging the vast amount of unlabeled video data, our method offers extensive stability and is easy to scale up with low cost. Unlike previous methods requiring full video sequences, StableDepth enables online inference at 13$\times$ faster speed, while achieving significant accuracy improvements (6.4\%-86.8\%) across multiple benchmarks and delivering comparable temporal consistency to video diffusion based depth estimators. We highly encourage viewing the supplementary video materials to gain a better understanding of the effectiveness of our approach.

Live content is unavailable. Log in and register to view live content