Skip to yearly menu bar Skip to main content


Poster

Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation

Akshay Krishnan · Xinchen Yan · Vincent Casser · Abhijit Kundu


Abstract:

We introduce Orchid, a unified latent diffusion model that learns a joint appearance-geometry learned prior to generate color, depth, and surface normal images in a single diffusion process. This unified approach is more efficient and coherent than current pipelines that use separate models for appearance and geometry. Orchid is versatile—it directly generates color, depth, and normal images from text, supports joint monocular depth and normal estimation with color-conditioned finetuning, and seamlessly inpaints large 3D regions by sampling from the joint distribution. It leverages a novel Variational Autoencoder (VAE) that jointly encodes RGB, relative depth, and surface normals into a shared latent space, combined with a latent diffusion model that denoises these latents. Our extensive experiments demonstrate that Orchid delivers competitive performance against SOTA task-specific geometry prediction methods, even surpassing them in normal-prediction accuracy and depth-normal consistency. It also inpaints color-depth-normal images jointly, with more qualitative realism than existing multi-step methods.

Live content is unavailable. Log in and register to view live content