Skip to yearly menu bar Skip to main content


Poster

Deeply Supervised Flow-Based Generative Models

Inkyu Shin · Chenglin Yang · Liang-Chieh (Jay) Chen


Abstract:

Flow-based generative models have charted an impressive path across multiple visual generation tasks by adhering to a simple principle: learning velocity representations of a linear interpolant. However, we observe that training velocity solely from the final layer’s output under-utilizes the rich inter-layer representations, potentially impeding model convergence. To address this limitation, we introduce DeepFlow, a novel framework that enhances velocity representation through inter-layer communication. DeepFlow partitions transformer layers into balanced branches with deep supervision and inserts a lightweight Velocity Refiner with Acceleration (VeRA) block between adjacent branches, which aligns the intermediate velocity features within transformer blocks. Powered by the improved deep supervision via the internal velocity alignment, DeepFlow converges 8x faster on ImageNet-256x256 with equivalent performance and further reduces FID by 2.6 while halving training time compared to previous flow-based models without a classifier-free guidance. DeepFlow also outperforms baselines in text-to-image generation tasks, as evidenced by evaluations on MS-COCO and zero-shot GenEval. The code will be made publicly available.

Live content is unavailable. Log in and register to view live content