ICCV Poster Latent-Reframe: Enabling Camera Control for Video Diffusion Models without Training

Poster

Latent-Reframe: Enabling Camera Control for Video Diffusion Models without Training

Zhenghong Zhou · Jie An · Jiebo Luo

Exhibit Hall I #1176

[ Abstract ]

Wed 22 Oct 2:15 p.m. PDT — 4:15 p.m. PDT

Abstract:

Precise camera pose control is crucial for video generation with diffusion models. Existing methods require fine-tuning with additional datasets containing paired videos and camera pose annotations, which are both data-intensive and computationally costly, and may disrupt the model's distribution learned from the training data. We introduce Latent-Reframe, which enables camera control in a pre-trained video diffusion model without fine-tuning. Unlike existing methods, Latent-Reframe operates during the sampling stage, maintaining efficiency while preserving the distribution learned during pretraining. Our approach reframes the latent code of video frames to align with the input camera trajectory through time-aware point clouds. Latent code inpainting and harmonization then refine the model’s latent space, ensuring high-quality video generation. Latent-Reframe can be applied to both DiT- and UNet-based video diffusion models. Experimental results demonstrate that Latent-Reframe can achieve comparable or superior camera control precision and video quality to training-based methods, without the need for fine-tuning on additional datasets. Please open video_results.html in supplementary material to view generated videos.

Live content is unavailable. Log in and register to view live content