Skip to yearly menu bar Skip to main content


Poster

Generating Multi-Image Synthetic Data for Text-to-Image Customization

Nupur Kumari · Xi Yin · Jun-Yan Zhu · Ishan Misra · Samaneh Azadi


Abstract:

Customization of text-to-image models enables users to insert custom concepts or objects and generate them in unseen settings. Existing methods either rely on comparatively expensive test-time optimization or train encoders on single-image datasets without multi-image supervision, which can limit image quality. We propose a simple approach to address these challenges. We first leverage existing text-to-image models and 3D datasets to create a high-quality Synthetic Customization Dataset (SynCD) consisting of multiple images of the same object in different lighting, backgrounds, and poses. Using this dataset, we train an encoder-based model that conditions on reference images via a shared attention mechanism to better incorporate fine-grained visual details from reference images. Finally, we propose a new inference technique that normalizes text and image guidance vectors to mitigate overexposure issues during inference. Through extensive experiments, we show that our encoder-based model, trained on the synthetic dataset with the proposed inference algorithm, improves upon existing encoder-based methods on standard customization benchmarks.

Live content is unavailable. Log in and register to view live content