Skip to yearly menu bar Skip to main content


Poster

Discovering Divergent Representations between Text-to-Image Models

Lisa Dunlap · Trevor Darrell · Joseph Gonzalez · Fabian Caba Heilbron · Josef Sivic · Bryan Russell


Abstract: In this paper, we investigate when and how visual representations learned by two different generative models {\bf diverge} from each other. Specifically, given two text-to-image models, our goal is to discover visual attributes that appear in images generated by one model but not the other, along with the types of prompts that trigger these attribute differences. For example, "flames" might appear in one model’s outputs when given prompts expressing strong emotions, while the other model does not produce this attribute given the same prompts.We introduce CompCon (Comparing Concepts), an evolutionary search algorithm that discovers visual attributes more prevalent in one model's output than the other, and uncovers the prompt concepts linked to these visual differences. To evaluate our method's ability to find diverging representations, we create an automated data generation pipeline to produce ID$^2$, a dataset of 60 input-dependent differences, and compare our approach to several LLM- and VLM-powered baselines. Finally, we apply CompCon to compare two popular text to image models, PixArt and SD-Lightning. We find diverging representations such as how prompts mentioning loneliness result in depictions of "wet streets" in PixArt, as well as bias like how PixArt generates older men for prompts mentioning traditional professions.

Live content is unavailable. Log in and register to view live content