Poster
Describe, Don’t Dictate: Semantic Image Editing with Natural Language Intent
En Ci · Shanyan Guan · Yanhao Ge · Yilin Zhang · Wei Li · Zhenyu Zhang · Jian Yang · Ying Tai
Despite the progress in text-to-image generation, semantic image editing remains a challenge. Inversion-based methods introduce reconstruction errors and inefficiencies, while instruction-based models suffer from limited datasets, architectural constraints, and high computational costs. We propose DescriptiveEdit, a description-driven editing framework that preserves the generative power of pre-trained T2I models without architectural modifications or inversion. A Cross-Attentive UNet with an attention bridge enables direct feature fusion, while LoRA-based tuning ensures efficiency and compatibility. Without retraining, DescriptiveEdit seamlessly integrates with ControlNet, IP-Adapter, and other extensions. Experiments show it improves editing accuracy and consistency while significantly reducing computational costs, providing a scalable and flexible solution for text-guided image manipulation.
Live content is unavailable. Log in and register to view live content