Poster
Trade-offs in Image Generation: How Do Different Dimensions Interact?
Sicheng Zhang · Binzhu Xie · Zhonghao Yan · Yuli Zhang · Donghao Zhou · Xiaofei Chen · Shi Qiu · Jiaqi Liu · Guoyang Xie · Zhichao Lu
Model performance in text-to-image (T2I) and image-to-image (I2I) generation often depends on multiple aspects, including quality, alignment, diversity, and robustness. However, models’ complex trade-offs among these dimensions have been rarely explored due to (1) the lack of datasets that allow fine-grained quantification of these trade-offs, and (2) using a single metric for multiple dimensions. To address this gap, we introduce TRIG-Bench (Trade-offs in Image Generation), which spans 10 dimensions (Realism, Originality, Aesthetics, Content, Relation, Style, Knowledge, Ambiguity, Toxicity and Bias), contains over 40,200 samples, and covers 132 Pairwise Dimensional Subsets. Furthermore, we develop TRIGScore, a VLM-as-judge metric that automatically adapts to various dimensions. Based on this, we evaluate 14 cutting-edge models across T2I and I2I tasks. In addition, we propose the Relation Recognition System and generate the Dimension Trade-off Map (DTM), which visualizes model-specific capability trade-offs. Our experiments demonstrate that DTM consistently provides a comprehensive understanding of the trade-offs between dimensions for each type of generation models. Notably, after fine-tuning on DTM, the model's dimension-specific impact is mitigated, and overall performance is enhanced.
Live content is unavailable. Log in and register to view live content