ICCV Poster Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction

Poster

Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction

Yuanhao Cai · He Zhang · Kai Zhang · Yixun Liang · Mengwei Ren · Fujun Luan · Qing Liu · Soo Ye Kim · Jianming Zhang · Zhifei Zhang · Yuqian Zhou · YULUN ZHANG · Xiaokang Yang · Zhe Lin · Alan Yuille

#2307

[ Abstract ] [ Project Page ]

Thu 23 Oct 5:45 p.m. PDT — 7:45 p.m. PDT

Abstract: Existing feedforward image-to-3D methods mainly rely on 2D multi-view diffusion models that cannot guarantee 3D consistency. These methods easily collapse when changing the prompt view direction and mainly handle object-centric cases. In this paper, we propose a novel single-stage 3D diffusion model, DiffusionGS, for object generation and scene reconstruction from a single view. DiffusionGS directly outputs 3D Gaussian point clouds at each timestep to enforce view consistency and allow the model to generate robustly given prompt views of any directions, beyond object-centric inputs. Plus, to improve the capability and generality of DiffusionGS, we scale up 3D training data by developing a scene-object mixed training strategy. Experiments show that DiffusionGS yields improvements of 2.20 dB/23.25 and 1.34 dB/19.16 in PSNR/FID for objects and scenes than the state-of-the-art methods, without using 2D diffusion prior and depth estimator. Plus, our method enjoys over 5$\times$ faster speed ($\sim$6s on an A100 GPU). Code will be released.

Live content is unavailable. Log in and register to view live content