ICCV Poster S$^{2}$ M$^{2}$: Scalable Stereo Matching Model for Reliable Depth Estimation

Poster

S$^{2}$ M$^{2}$: Scalable Stereo Matching Model for Reliable Depth Estimation

JUNHONG MIN · YOUNGPIL JEON · Jimin Kim · Minyong Choi

[ Abstract ]

Abstract: Accurate and scalable stereo matching remains a critical challenge, particularly for high-resolution images requiring both fine-grained disparity estimation and computational efficiency. While recent methods have made progress, achieving global and local consistency alongside computational efficiency remains difficult. Transformer-based models effectively capture long-range dependencies but suffer from high computational overhead, while cost volume-based iterative methods rely on local correlations, limiting global consistency and scalability to high resolutions and large disparities. To address these issues, we introduce S$^2$M$^2$, a Scalable Stereo Matching Model that achieves high accuracy, efficiency, and generalization without compromise. Our approach integrates a multi-resolution transformer framework, enabling effective information aggregation across different scales. Additionally, we propose a new loss function that enhances disparity estimation by concentrating probability on feasible matches. Beyond disparity prediction, S$^2$M$^2$ jointly estimates occlusion and confidence maps, leading to more robust and interpretable depth estimation. Unlike prior methods that rely on dataset-specific tuning, S$^2$M$^2$ is trained from scratch without dataset-specific adjustments, demonstrating strong generalization across diverse benchmarks. Extensive evaluations on Middlebury v3, ETH3D, and our high-fidelity synthetic dataset establish new state-of-the-art results.

Live content is unavailable. Log in and register to view live content