Poster Tue, Oct 21, 2025 • 6:15 PM – 8:15 PM PDT Exhibit Hall I #385

RayPose: Ray Bundling Diffusion for Template Views in Unseen 6D Object Pose Estimation

Junwen Huang · Shishir Reddy Vutukur · Peter Yu · Nassir Navab · Slobodan Ilic · Benjamin Busam

Abstract

Typical template-based object pose pipelines first find the closest template and then align it to the current observation.The failure to find the closest template results in the wrong pose estimate. Instead, we reformulate object pose estimation with template images as a ray alignment problem where viewing directions from multiple posed template views need to mutually align with a non-posed object query.Inspired by recent advancements in denoising diffusion frameworks for camera pose estimation, we integrate this formulation into a diffusion transformer architecture capable of aligning a single query image of an object to a set of template views. Our method reparametrizes object rotation by introducing object-centered camera rays and object translation by extending Scale-Invariant Translation Estimation (SITE) to dense translation offsets. Our method leverages view priors from template images to enhance the model's ability to accurately infer query object poses. Using a coarse-to-fine training strategy with narrowed template sampling, our approach improves performance without modifying the network architecture, increasing robustness in 6D object pose estimation.Extensive evaluations on various benchmark datasets demonstrate the superiority of our method over state-of-the-art approaches in unseen object pose estimation. Our code will be made publicly available.

Chat is not available.