Skip to yearly menu bar Skip to main content


Poster

A Simple yet Mighty Hartley Diffusion Versatilist for Generalizable Dense Vision Tasks

Qi Bi · Jingjun Yi · Huimin Huang · Hao Zheng · Haolan Zhan · Wei Ji · Yawen Huang · Yuexiang Li · Yefeng Zheng


Abstract:

Diffusion models have demonstrated powerful capability as a versatilist for dense vision tasks, yet the generalization ability to unseen domains remains rarely explored.In light of this issue, we focus on investigating generalizable paradigms for diffusion based dense prediction and propose an efficient frequency learning scheme, dubbed as \texttt{HarDiff}, alleviating the domain gap across various scenes.Interestingly, the low-frequency features, converted by the Discrete Hartley Transform, activate the broader content of an image, while the high-frequency features maintain sufficient details for dense pixels.Hence, our \texttt{HarDiff} is driven by two compelling designs:(1) Low-Frequency Training Process, which extracts structural priors from the source domain, for enhancing understanding of task-related content;(2) High-Frequency Sampling Process, which utilizes detail-oriented guidance from the unseen target domain, to infer precise dense predictions with target-related details.Extensive empirical evidence shows that \texttt{HarDiff} can be easily plugged into various dense vision tasks, \eg. semantic segmentation, depth estimation and haze removal, yielding improvements over the state-of-the-art methods in twelve public benchmarks. We will release our code.

Live content is unavailable. Log in and register to view live content