Skip to yearly menu bar Skip to main content


Poster

Reusing Computation in Text-to-Image Diffusion for Efficient Generation of Image Sets

Dale Decatur · Thibault Groueix · Wang Yifan · Rana Hanocka · Vladimir Kim · Matheus Gadelha


Abstract:

Text-to-image diffusion models enable high-quality image generation but are computationally expensive, especially when producing large image collections. While prior work optimizes per-inference efficiency, we explore an orthogonal approach: reducing redundancy across multiple correlated prompts. Our key insight leverages the coarse-to-fine nature of diffusion models, where early denoising steps capture shared structures among similar prompts. We propose a training-free method that clusters prompts based on semantic similarity and shares computation in early diffusion steps. Experiments show that for models trained conditioned on image embeddings, our approach significantly reduces compute cost while improving image quality. By leveraging UnClip’s text-to-image prior, we enhance diffusion step allocation for greater efficiency. Our method seamlessly integrates with existing pipelines, scales with prompt sets, and reduces the environmental and financial burden of large-scale text-to-image generation.

Live content is unavailable. Log in and register to view live content