Skip to yearly menu bar Skip to main content


Poster

HAMSt3R: Human Aware Multi-view Stereo 3D Reconstruction

Sara Rojas Martinez · Matthieu Armando · Bernard Ghanem · Philippe Weinzaepfel · Vincent Leroy · GrĂ©gory Rogez


Abstract:

Recovering the 3D geometry of a scene from a sparse set of uncalibrated images is a long-standing problem in computer vision. While recent learning-based approaches such as DUSt3R and MASt3R have demonstrated impressive results by directly predicting dense scene geometry, they are primarily trained on outdoor scenes with static environments and struggle to handle human-centric scenarios. In this work, we introduce HAMSt3R, an extension of MASt3R for joint human and scene 3D reconstruction from sparse, uncalibrated multi-view images. First, we use a strong image encoder by distilling the ones from MASt3R and from a state-of-the-art Human Mesh Recovery (HMR) model, multi-HMR, for a better understanding of scene geometry and human bodies. Our method then incorporates additional network heads to segment humans, estimate dense correspondences via DensePose, and predict depth in human-centric environments, enabling a more holistic 3D reconstruction. By leveraging the outputs of our different heads, HAMSt3R produces a dense point map enriched with human semantic information in 3D.Unlike existing methods that rely on complex optimization pipelines, our approach is fully feed-forward and efficient, making it suitable for real-world applications.We evaluate our model on EgoHumans and EgoExo4D, two challenging benchmarks containing diverse human-centric scenarios. Additionally, we validate its generalization to traditional multi-view stereo tasks, as well as multi-view pose regression. Our results demonstrate that our method can reconstruct humans effectively while preserving strong performance in general 3D reconstruction tasks, bridging the gap between human and scene understanding in 3D vision.

Live content is unavailable. Log in and register to view live content