Poster
SpatialTrackerV2: Advancing 3D Point Tracking with Explicit Camera Motion
Yuxi Xiao · Jianyuan Wang · Nan Xue · Nikita Karaev · Iurii Makarov · Bingyi Kang · Xing Zhu · Hujun Bao · Yujun Shen · Xiaowei Zhou
3D point tracking from monocular videos has recently shown promising results, attracting increasing attention from the community. However, existing methods typically struggle with two key challenges: (a) significant background motion caused by camera movement, and (b) frequent occlusions that necessitate re-identifying previously observed objects. Monocular egocentric videos are prime examples where these challenges prominently arise. In this work, we introduce SpatialTrackerV2, a novel 3D point tracking approach capable of computing accurate 3D trajectories for arbitrary 2D pixels, excelling not only in common video scenarios but also in challenging contexts with substantial camera motion and frequent occlusions. Our method separates camera motion from object motion, explicitly modeling the camera movement and its interplay with depth maps to significantly enhance 3D point tracking. Additionally, we propose a joint refinement module that simultaneously improves depth estimation, camera motion, and 3D tracking accuracy in a unified manner. Benefiting from large-scale training on a mixture of synthetic and real-world data, SpatialTrackerV2 demonstrates strong robustness and generalization capabilities. Extensive experiments across different benchmarks validate its effectiveness and substantial performance improvements over existing approaches.
Live content is unavailable. Log in and register to view live content