ICCV Poster MiDSummer: Multi-Guidance Diffusion for Controllable Zero-Shot Immersive Gaussian Splatting Scene Generation

Poster

MiDSummer: Multi-Guidance Diffusion for Controllable Zero-Shot Immersive Gaussian Splatting Scene Generation

Anjun Hu · Richard Tomsett · Valentin Gourmet · Massimo Camplani · Jas Kandola · Hanting Xie

Exhibit Hall I #2475

[ Abstract ]

Thu 23 Oct 5:45 p.m. PDT — 7:45 p.m. PDT

Abstract:

We present MiDSummer, a two-stage framework for generating immersive Gaussian Splatting scenes that leverages multiple diffusion guidance signals to enable structured layout control, enhanced physical realism, and improved visual quality.While 3D scene generation has seen significant recent advances, current approaches could benefit from: (1) achieving precise, reliable layout control while preserving open-world generalization and physical plausibility, (2) balancing high-level semantic reasoning with low-level, directly controllable geometric constraints, and (3) effectively utilizing layout knowledge for visual refinement. Our work addresses these challenges through a structured two-stage planning-assembly framework.For planning, we introduce a dual layout diffusion guidance approach to bridge semantic reasoning and geometric controllability. Our approach uniquely integrates LLMs' open-vocabulary reasoning with Graph Diffusion Models' (GDM) geometric precision by incorporating multi-level self-consistency scores over scene graph structures and layout bounding box parameters. This fusion enables fine-grained control over scene composition while ensuring physical plausibility and faithful prompt interpretation.For assembly, we propose a layout-guided optimization technique for scene refinement. We effectively incorporate layout priors obtained during the planning stage into a Stable Diffusion (SD)-based refinement process that jointly optimizes camera trajectories and scene splats. This layout-aware joint optimization, constrained by multi-view consistency, produces visually compelling immersive scenes that are structurally coherent and controllable.

Live content is unavailable. Log in and register to view live content