Poster
Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution
Peng Du · Hui Li · Han Xu · Paul Jeon · Dongwook Lee · Daehyun Ji · Ran Yang · Feng Zhu
Discrete Wavelet Transform (DWT) has been widely explored to enhance the performance of image super-resolution (SR). Despite some DWT-based methods improving SR by capturing fine-grained frequency signals, most existing approaches neglect the interrelations among multi-scale frequency sub-bands, resulting in inconsistencies and unnatural artifacts in the reconstructed images. To address this challenge, we propose a Diffusion Transformer model based on image Wavelet spectra for SR (DTWSR). DTWSR incorporates the superiority of diffusion models and transformers to capture the interrelations among multi-scale frequency sub-bands, leading to a more consistence and realistic SR image. Specifically, we use a Multi-level Discrete Wavelet Transform (MDWT) to decompose images into wavelet spectra. A pyramid tokenization method is proposed which embeds the spectra into a sequence of tokens for transformer model, facilitating to capture features from both spatial and frequency domain. A dual-decoder is designed elaborately to handle the distinct variances in low-frequency (LF) and high-frequency (HF) sub-bands, without omiting their alignment in image generation. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of our method, with high performance on both perception quality and fidelity.
Live content is unavailable. Log in and register to view live content