ICCV Poster Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Poster

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Xuran Ma · Yexin Liu · Yaofu LIU · Xianfeng Wu · Mingzhe Zheng · Zihao Wang · Ser-Nam Lim · Harry Yang

Exhibit Hall I #1581

[ Abstract ]

Wed 22 Oct 5:45 p.m. PDT — 7:45 p.m. PDT

Abstract:

Video generation using diffusion models has shown remarkable progress, yet it remains computationally expensive due to the repeated processing of redundant features across blocks and steps. To address this, we propose a novel adaptive feature reuse mechanism that dynamically identifies and caches the most informative features by focusing on foreground and caching more on background, significantly reducing computational overhead with less sacrificing video quality. By leveraging the step and block caching, our method achieves up to 1.8× speed up on HunyuanVideo while maintaining competitive performance on Vbench, PSNR, SSIM, FID and LPIPS. Extensive experiments demonstrate that our approach not only improves efficiency but also enhances the quality of generated videos. The proposed method is generalizable and can be integrated into existing diffusion transformer frameworks.

Live content is unavailable. Log in and register to view live content