Poster
Large Scene Generation with Cube-Absorb Discrete Diffusion
Qianjiang Hu · Wei Hu
Generating realistic 3D outdoor scenes is essential for applications in autonomous driving, virtual reality, environmental science, and urban development. Traditional 3D generation approaches using single-layer diffusion methods can produce detailed scenes for individual objects but struggle with high-resolution, large-scale outdoor environments due to scalability limitations. Recent hierarchical diffusion models tackle this by progressively scaling up low-resolution scenes. However, they often sample fine details from total noise rather than from the coarse scene, which limits the efficiency. We propose a novel cube-absorb discrete diffusion (CADD) model, which deploys low-resolution scenes as the base state in the diffusion process to generate fine details, eliminating the need to sample entirely from noise. Moreover, we introduce the Sparse Cube Diffusion Transformer (SCDT), a transformer-based model with a sparse cube attention operator, optimized for generating large-scale sparse voxel scenes. Our method demonstrates state-of-the-art performance on the CarlaSC and KITTI360 datasets, supported by qualitative visualizations and extensive ablation studies that highlight the impact of the CADD process and sparse block attention operator on high-resolution 3D scene generation.
Live content is unavailable. Log in and register to view live content