ICCV Poster MSA$^2$: Multi-task Framework with Structure-aware and Style-adaptive Character Representation for Open-set Chinese Text Recognition

Poster

MSA$^2$: Multi-task Framework with Structure-aware and Style-adaptive Character Representation for Open-set Chinese Text Recognition

Yangfu Li · Hongjian Zhan · Qi Liu · Li Sun · Yu-Jie Xiong · Yue Lu

[ Abstract ]

Abstract: Most existing methods regard open-set Chinese text recognition (CTR) as a single-task problem, primarily focusing on prototype learning of linguistic components or glyphs to identify unseen characters. In contrast, humans identify characters by integrating multiple perspectives, including linguistic and visual cues. Inspired by this, we propose a multi-task framework termed MSA$^2$, which considers multi-view character representations for open-set CTR. Within MSA$^2$, we introduce two novel strategies for character representation: structure-aware component encoding (SACE) and style-adaptive glyph embedding (SAGE). SACE utilizes a binary tree with dynamic representation space to emphasize the primary linguistic components, thereby generating structure-aware and discriminative linguistic representations for each character. Meanwhile, SAGE employs a glyph-centric contrastive learning to aggregate features from diverse forms, yielding robust glyph representations for the CTR model to adapt to the style variations among various fonts. Extensive experiments demonstrate that our proposed MSA$^2$ outperforms state-of-the-art CTR methods, achieving an average improvement of 1.3% and 6.0% in accuracy under closed-set and open-set settings on the BCTR dataset, respectively. The code will be available soon.

Live content is unavailable. Log in and register to view live content