Skip to yearly menu bar Skip to main content


Poster

Zero-Shot Depth Aware Image Editing with Diffusion Models

Rishubh Parihar · Sachidanand VS · Venkatesh Babu Radhakrishnan


Abstract:

Diffusion models have transformed image editing but struggle with precise depth-aware control, such as placing objects at a specified depth. Layered representations offer fine-grained control by decomposing an image into separate editable layers. However, existing methods simplistically represent a scene via a set of background and transparent foreground layers while ignoring the scene geometry - limiting their effectiveness for depth-aware editing. We propose \textbf{D}epth-\textbf{G}uided \textbf{L}ayer \textbf{D}ecomposition - a layering method that decomposes an image into foreground and background layers based on a \textbf{user-specified depth value}, enabling precise depth-aware edits. We further propose \textbf{F}eature \textbf{G}uided \textbf{L}ayer \textbf{C}ompositing - a zero-shot approach for realistic layer compositing by leveraging generative priors from pretrained diffusion models. Specifically, we guide the internal U-Net features to progressively fuse individual layers into a composite latent at each denoising step. This preserves the structure of individual layers while generating realistic outputs with appropriate color and lighting adjustments without a need for post-hoc harmonization models. We demonstrate our method on two key depth-aware editing tasks: \textbf{1)} scene compositing by blending the foreground of one scene with the background of another at a specified depth, and; \textbf{2)} object insertion at a user-defined depth. Our zero-shot approach achieves precise depth ordering and high-quality edits, surpassing specialized scene compositing and object placement baselines, as validated across benchmarks and user studies.

Live content is unavailable. Log in and register to view live content