Skip to yearly menu bar Skip to main content


Poster

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

Tianqi Liu · Zihao Huang · Zhaoxi Chen · Guangcong Wang · Shoukang Hu · Liao Shen · Huiqiang Sun · Zhiguo Cao · Wei Li · Ziwei Liu


Abstract:

We present Free4D, a novel tuning-free framework for 4D scene generation from a single image. Existing methods either focus on object-level generation, making scene-level generation infeasible, or rely on large-scale multi-view video datasets for expensive training, with limited generalization ability due to the scarcity of 4D scene data. In contrast, our key insight is to distill pre-trained foundation models for consistent 4D scene representation, which offers promising advantages such as efficiency and generalizability. 1) To achieve this, we first animate the input image using image-to-video diffusion models followed by 4D geometric structure initialization. 2) To lift this coarse structure into spatial-temporal consistent multi-view videos, we design an adaptive guidance mechanism with a point-guided denoising strategy for spatial consistency and a novel latent replacement strategy for temporal coherence. 3) To turn these generated observations into consistent 4D representation, we propose a modulation-based refinement to mitigate inconsistencies while fully leveraging the generated information. The resulting 4D representation enables real-time, controllable temporal-spatial rendering, marking a significant advancement in single-image-based 4D scene generation. Code will be released.

Live content is unavailable. Log in and register to view live content