Poster Exhibit Hall I #427

Engage for All: Making Ordinary Image Descriptions Appealing Again!

Yuyan Chen ⋅ Yifan Jiang ⋅ Li Zhou ⋅ Jinghan Cao ⋅ Yu Guan ⋅ Ming Yang ⋅ Qingpei Guo

2025 Poster

Abstract

In recent years, multi-modal large language models (MLLMs) have been successfully adopted to generate humorous and engaging descriptions for internet memes. While, it is challenging for the same approaches to apply to ordinary images which lack of inherent funny or exaggerated contents. Thus, crafting appealing descriptions for ordinary image demands imaginative efforts to discover or create intriguing connections between words to image contents. To address this gap, we introduce AppealImage, a large-scale dataset consisting of ordinary images paired with appealing descriptions. AppealImage allows us to define four distinct tasks with quantitative metrics to enable objective evaluation. Subsequently, we propose CharmNet, an innovative framework designed to generate appealing descriptions for ordinary images. CharmNet combines instruction tuning with heuristic active learning, guided by a referee model. Experimental results demonstrate that CharmNet outperforms the state-of-the-art method by 11.4\% in generating appealing descriptions. Furthermore, CharmNet delivers impressive performance across various creative applications, including visual storytelling and situational dialogue generation. These results highlight CharmNet's potential to enhance social media engagement and to empower strong brand presence in competitive markets.

Chat is not available.