Poster
Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation
Akshay Krishnan · Xinchen Yan · Vincent Casser · Abhijit Kundu
We introduce Orchid, a unified latent diffusion model that learns a joint appearance-geometry learned prior to generate color, depth, and surface normal images in a single diffusion process. This unified approach is more efficient and coherent than current pipelines that use separate models for appearance and geometry. Orchid is versatile—it directly generates color, depth, and normal images from text, supports joint monocular depth and normal estimation with color-conditioned finetuning, and seamlessly inpaints large 3D regions by sampling from the joint distribution. It leverages a novel Variational Autoencoder (VAE) that jointly encodes RGB, relative depth, and surface normals into a shared latent space, combined with a latent diffusion model that denoises these latents. Our extensive experiments demonstrate that Orchid delivers competitive performance against SOTA task-specific geometry prediction methods, even surpassing them in normal-prediction accuracy and depth-normal consistency. It also inpaints color-depth-normal images jointly, with more qualitative realism than existing multi-step methods.
Live content is unavailable. Log in and register to view live content