ICCV Poster Hyper-Depth: Hypergraph-based Multi-Scale Representation Fusion for Monocular Depth Estimation

Poster

Hyper-Depth: Hypergraph-based Multi-Scale Representation Fusion for Monocular Depth Estimation

Lin Bie · Siqi Li · Yifan Feng · Yue Gao

[ Abstract ]

Abstract:

Monocular depth estimation (MDE) is a fundamental problem in computer vision with wide-ranging applications in various downstream tasks. While multi-scale features are perceptually critical for MDE, existing transformer-based methods have yet to leverage them explicitly. To address this limitation, we propose a hypergraph-based multi-scale representation fusion framework, Hyper-Depth.The proposed Hyper-Depth incorporates two key components: a Semantic Consistency Enhancement (SCE) module and a Geometric Consistency Constraint (GCC) module. The SCE module, designed based on hypergraph convolution, aggregates global information and enhances the representation of multi-scale patch features. Meanwhile, the GCC module provides geometric guidance to reduce over-fitting errors caused by excessive reliance on local features. In addition, we introduce a Correlation-based Conditional Random Fields (C-CRFs) module as the decoder to filter correlated patches and compute attention weights more effectively.Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches across all evaluation metrics on the KITTI and NYU-Depth-v2 datasets, achieving improvements of 6.21% and 3.32% on the main metric RMSE, respectively. Furthermore, zero-shot evaluations on the nuScenes and SUN-RGBD datasets validate the generalizability of our approach.

Live content is unavailable. Log in and register to view live content