Skip to yearly menu bar Skip to main content


Poster

Accelerating Diffusion Sampling via Exploiting Local Transition Coherence

shangwen zhu · Han Zhang · Zhantao Yang · Qianyu Peng · Zhao Pu · Huangji Wang · Fan Cheng


Abstract: Text-based diffusion models have made significant breakthroughs in generating high-quality images and videos from textual descriptions. However, the lengthy sampling time of the denoising process remains a significant bottleneck in practical applications. Previous methods either ignore the statistical relationships between adjacent steps or rely on attention or feature similarity between them, which often only works with specific network structures. To address this issue, we discover a new statistical relationship in the transition operator between adjacent steps, focusing on the relationship of the outputs from the network. This relationship does not impose any requirements on the network structure. Based on this observation, we propose a novel $\textbf{training-free}$ acceleration method called LTC-Accel, which uses the identified relationship to estimate the current transition operator based on adjacent steps. Due to no specific assumptions regarding the network structure, LTC-Accel is applicable to almost all diffusion-based methods and orthogonal to almost all existing acceleration techniques, making it easy to combine with them. Experimental results demonstrate that LTC-Accel significantly speeds up sampling in text-to-image and text-to-video synthesis while maintaining competitive sample quality. Specifically, LTC-Accel achieves a speedup of $\mathbf{1.67\times}$ in Stable Diffusion v2 and a speedup of $\mathbf{1.55\times}$ in video generation models. When combined with distillation models, LTC-Accel achieves a remarkable $\mathbf{10\times}$ speedup in video generation, allowing $\textbf{real-time}$ generation of more than $\mathbf{16 \text{FPS}}$.

Live content is unavailable. Log in and register to view live content