Poster
MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips
SHIBO WANG · Haonan He · Maria Parelli · Christoph Gebhardt · Zicong Fan · Jie Song
We interact with objects everyday, making the holistic 3D reconstruction of hands and objects from videos essential for applications like robotic in-hand manipulation. While most RGB-based methods rely on object templates, existing template-free approaches depend heavily on image observations, assuming full visibility of the object in the video. However, this assumption often does not hold in real-world scenarios, where cameras are fixed and objects are held in a static grip. As a result, parts of the object may remain unobserved, leading to unrealistic reconstructions when the object is under-observed. To this end, we introduce MagicHOI, a method for reconstructing hands and objects from short monocular interaction videos, even under limited views. Our key insight is that, although paired 3D hand-object data is extremely scarce, large-scale diffusion models like image-to-3D models offer abundant object supervision. This additional supervision can act as a prior to help regularize unseen object regions during hand interactions. Leveraging this insight, MagicHOI incorporates an existing image-to-3D diffusion model into a hand-object reconstruction framework. We then refine hand poses by incorporating hand-object interaction constraints. Our results demonstrate that MagicHOI significantly outperforms existing state-of-the-art template-free hand-object reconstruction methods. We also show that image-to-3D diffusion priors effectively regularize unseen object regions, enhancing 3D hand-object reconstruction. Furthermore, the improved object geometries lead to more accurate hand poses.Our code will be made available for research purposes.
Live content is unavailable. Log in and register to view live content