Skip to yearly menu bar Skip to main content


Poster

LLM Thought Divergence and Convergence for Dialogue-Based Image Generation Control

Hui Li


Abstract:

Generative AI (GenAI), which revolutionized both computer vision and natural language processing, has drawn continuous attention recently. Benefits from GenAI with the evolution of large language models (LLMs), the image generation task evolved from prompt-based to dialogue-based, which takes the real-world human intent expressed through conversations. When breaking this task into multiple steps, the best pathway of analyzing the dialogues is not determined, such as whether the objects or prompted template should be focused on the first step of dialogues analyzing. Thus, a multi-chain reasoning is requested to decompose this application beyond a pure chain-of-thought structure. After the divergent process, the question comes to how to converge the thinking chain that leads to the best matched image, which requires a new evaluation method to lead the thinking process. To address these challenges, we propose the LLM Thought Divergence and Convergence (LTDC) framework, which simulates human cognitive processes through three phases: (1) The Step-by-Step Thought process decomposes dialogue-based image generation tasks into sequential thinking chains using LLMs; (2) The Image Generation process creates image prompts following these thought instructions and produces corresponding images; (3) The Evaluation process aligns the coherence between generated images and dialogues through a multi-modal LLM, guiding the selection of optimal thinking chains. Evaluated on VisDial, our LTE framework achieves a 4.87\% improvement in CLIP similarity, demonstrating the effectiveness in generating images with higher semantic fidelity.

Live content is unavailable. Log in and register to view live content