Skip to yearly menu bar Skip to main content


Poster

ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation

Sherry Chen · Yi Wei · Luowei Zhou · Suren Kumar


Abstract:

Recent advances in instruction-guided image editing underscore the need for effective automated evaluation. While Vision-Language Models (VLMs) have been explored as judges, open-source models struggle with alignment, and proprietary models lack transparency and cost efficiency. Additionally, no public training datasets exist to fine-tune open-source VLMs, only small benchmarks with diverse evaluation schemes. To address this, we introduce ADIEE, automated dataset creation approaches and scorer for instruction-guided image editing evaluation. We generate a large-scale dataset with over 100K samples and use it to fine-tune a LLaVA-NeXT-8B model. The resulting scorer out-performs all open-source VLMs and Gemini-Pro 1.5 across all benchmarks, achieving a 0.0706 (+17.48%) gain in score correlation with human ratings on AURORA-Bench and improving pair-wise comparison accuracy by 3.48% (+6.22%) on GenAI-Bench and 1.57% (+3.09%) on AURORA-Bench compared to the state-of-the-art. It can also enhance image editing models as a reward model, boosting the average evaluation score of edit outputs with respect to ImagenHub from 6.15 to 6.67 (+8.46%). Our code and dataset will be released upon acceptance.

Live content is unavailable. Log in and register to view live content