ICCV Poster Describe, Don’t Dictate: Semantic Image Editing with Natural Language Intent

Poster

Describe, Don’t Dictate: Semantic Image Editing with Natural Language Intent

En Ci · Shanyan Guan · Yanhao Ge · Yilin Zhang · Wei Li · Zhenyu Zhang · Jian Yang · Ying Tai

Exhibit Hall I #1777

[ Abstract ]

Wed 22 Oct 5:45 p.m. PDT — 7:45 p.m. PDT

Abstract:

Despite the progress in text-to-image generation, semantic image editing remains a challenge. Inversion-based methods introduce reconstruction errors and inefficiencies, while instruction-based models suffer from limited datasets, architectural constraints, and high computational costs. We propose DescriptiveEdit, a description-driven editing framework that preserves the generative power of pre-trained T2I models without architectural modifications or inversion. A Cross-Attentive UNet with an attention bridge enables direct feature fusion, while LoRA-based tuning ensures efficiency and compatibility. Without retraining, DescriptiveEdit seamlessly integrates with ControlNet, IP-Adapter, and other extensions. Experiments show it improves editing accuracy and consistency while significantly reducing computational costs, providing a scalable and flexible solution for text-guided image manipulation.

Live content is unavailable. Log in and register to view live content