Skip to yearly menu bar Skip to main content


Poster

UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis

Yuanrui Wang · Cong Han · Yafei Li · Zhipeng Jin · Xiawei Li · Sinan Du · Wen Tao · Yi Yang · shuanglong li · Chun Yuan · LIU LIN


Abstract:

Text-to-image generation has transformed content creation, yet precise visual text rendering remains challenging for generative models due to blurred glyphs, semantic inconsistencies, and limited style controllability. Current methods typically employ pre-rendered glyph images as conditional inputs, but their inability to preserve original font styles and color information forces reliance on multi-branch architectures to compensate for missing details. This leads to increased model complexity, higher computational costs, and reduced reusability.To address these limitations, we propose a segmentation-guided framework that leverages pixel-level visual text segmentation masks—complete representations preserving glyph shapes, colors, and spatial details—as unified conditional inputs. Our approach integrates two key innovations: (1) a fine-tuned bilingual segmentation model for extracting precise text masks from source images, and (2) a streamlined diffusion model enhanced with adaptive glyph condition and glyph region loss to ensure semantic and stylistic fidelity. On the AnyText-benchmark, our method achieves a sentence accuracy (Sen.Acc) of 0.8267 and a Normalized Edit Distance (NED) of 0.8976 for Chinese text generation, while the English test set delivers even stronger performance with 0.9018 Sen.Acc and 0.9582 NED, surpassing prior methods by substantial margins. To address broader evaluation needs, we introduce two novel benchmarks: GlyphMM-benchmark (for holistic glyph consistency assessment) and MiniText-benchmark (targeting small-scale glyph fidelity analysis). Experimental results demonstrate our method’s dominance across these new benchmarks: 16\% Sen.Acc improvement on the Chinese subset of GlyphMM-benchmark and 50\% gain on its English counterpart. Notably, our approach achieves over a 100\% Sen.Acc boost on the challenging MiniText test set designed for localized text regions. This breakthrough validates our architecture’s dual strengths: simplified deployment-ready design and superior generalization for cross-lingual text rendering tasks.

Live content is unavailable. Log in and register to view live content