ICCV Poster QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

Poster

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

Haoxuan Wang · Yuzhang Shang · Zhihang Yuan · Junyi Wu · Junchi Yan · Yan Yan

#1426

[ Abstract ] [ Project Page ]

Wed 22 Oct 5:45 p.m. PDT — 7:45 p.m. PDT

Abstract:

The practical deployment of diffusion models is still hindered by the high memory and computational overhead. Although quantization paves a way for model compression and acceleration, existing methods face challenges in achieving low-bit quantization efficiently. In this paper, we identify imbalanced activation distributions as a primary source of quantization difficulty, and propose to adjust these distributions through weight finetuning to be more quantization-friendly. We provide both theoretical and empirical evidence supporting finetuning as a practical and reliable solution. Building on this approach, we further distinguish two critical types of quantized layers: those responsible for retaining essential temporal information and those particularly sensitive to bit-width reduction. By selectively finetuning these layers under both local and global supervision, we mitigate performance degradation while enhancing quantization efficiency.Our method demonstrates its efficacy across three high-resolution image generation tasks, obtaining state-of-the-art performance across multiple bit-width settings.

Live content is unavailable. Log in and register to view live content