Poster
Frequency-Guided Diffusion for Training-Free Text-Driven Image Translation
Zheng Gao · Jifei Song · Zhensong Zhang · Jiankang Deng · Ioannis Patras
Current training-free text-driven image translation primarily uses diffusion features (convolution and attention) of pre-trained model as guidance to preserve the style/structure of source image in translated image. However, the coarse guidance at feature level struggles with style (e.g., visual patterns) and structure (e.g., edges) alignment with the source. Based on the observation that the low-/high-frequency components retain style/structure information of image, in this work, we propose training-free Frequency-Guided Diffusion (FGD), which tailors low-/high-frequency guidance for style- and structure-guided translation, respectively. For low-frequency guidance (style-guided), we substitute the low-frequency components of diffusion latents from sampling process with those from inversion of source and normalize the obtained latent with composited spectrum to enforce color alignment. For high-frequency guidance (structure-guided), we propose high-frequency alignment and high-frequency injection that compensate each other. High-frequency alignment preserves edges and contour by adjusting the predicted noise with guidance function that aligns high-frequency image regions between sampling and source image. High-frequency injection facilitates layout preservation by injecting high-frequency components of diffusion convolution features (from inversion) to sampling process. Qualitative and quantitative results verify the superiority of our method on style- and structure-guided translation tasks. We make the code publicly available at: withheld during review.
Live content is unavailable. Log in and register to view live content