Poster
FICGen: Frequency-Inspired Contextual Disentanglement for Layout-driven Degraded Image Generation
Wenzhuang Wang · Yifan Zhao · Mingcan Ma · Ming Liu · Zhonglin Jiang · Yong Chen · Jia Li
Layout-to-image (L2I) generation has exhibited promising results in natural image generation, but they face challenges and even fail when applied to degraded scenarios (\ie, low-light, underwater). This is primarily attributed to the ``contextual illusion dilemma'' within degraded contexts, where foreground instances are overwhelmed by context-dominant frequency distributions. Motivated by this, our paper proposes a new Frequency-Inspired Contextual Disentanglement Generative (FICGen) paradigm, which seeks to transfer frequency-aware knowledge (\ie, edges, textures) into the latent diffusion space, thereby better rendering the degraded instances via frequency-aware guidance. To be specific, FICGen consists of two major steps. First, we introduce a learnable dual-query mechanism, each paired with individual frequency resamplers, to perceive contextual frequency prototypes disentangled by degraded images. Subsequently, a visual-frequency enhanced attention is employed to incorporate the frequency knowledge within these prototypes into the degraded instance generation process. Second, to alleviate the attribute leakage and compensate for sample loss in dense and small objects, we propose an instance coherence map to regulate instance isolation, coupled with an adaptive spatial-frequency aggregation module to merge them in a spatial-frequency mixed manner. Extensive quantitative and qualitative experiments against L2I methods on four benchmarks illustrate superior quality and trainability of FICGen towards diverse degradation circumstances.
Live content is unavailable. Log in and register to view live content