Poster
AutoScape: Geometry-Consistent Long-Horizon Scene Generation
Jiacheng Chen · Ziyu Jiang · Mingfu Liang · Bingbing Zhuang · Jong-Chyi Su · Sparsh Garg · Ying Wu · Manmohan Chandraker
Video generation for driving scenes has gained increasing attention due to its broad range of applications, including autonomous driving, robotics, and mixed reality. However, generating high-quality, long-horizon, and 3D-consistent videos remains a challenge.We propose AutoScape, a framework designed for long-horizon driving scene generation. The framework comprises two stages: 1) Keyframe Generation, which anchors global scene appearance and geometry by autoregressively generating 3D-consistent keyframes using a joint RGB-D diffusion model, and 2) Interpolation, which employs a video diffusion model to generate dense frames conditioned on consecutive keyframes, ensuring temporal continuity and geometric consistency.With three innovative design choices to guarantee 3D consistency—RGB-D Diffusion, 3D Information Conditioning, and Warp Consistent Guidance—AutoScape achieves superior performance, generating realistic and geometrically consistent driving videos of up to 20 seconds at 12 FPS. Specifically, it improves the FID and FVD scores over the prior state-of-the-art by 48.6% and 43.0%, respectively, setting a new benchmark for long-horizon video generation in driving scenes.
Live content is unavailable. Log in and register to view live content