Poster Exhibit Hall I #150

GReg: Geometry-Aware Region Refinement for Sign Language Video Generation

Tongkai Shi ⋅ Lianyu Hu ⋅ Fanhua Shang ⋅ Liqing Gao ⋅ Wei Feng

2025 Poster

Abstract

Sign Language Video Generation (SLVG) aims to transform sign language sequences into natural and fluent sign language videos. Existing SLVG methods lack geometric modeling of human anatomical structures, leading to anatomically implausible and temporally inconsistent generation. To address these challenges, we propose a novel SLVG framework: Geometry-Aware Region Refinement (GReg). GReg uses 3D geometric information (such as normal maps and gradient maps) from the SMPL-X model to ensure anatomical and temporal consistency.To fully leverage the 3D geometric priors, we propose two novel methods: 1) Regional Prior Generation, which uses regional expert networks to generate target-structured regions as generation priors; 2) Gradient-Enhanced Refinement, which guides the refinement of detailed structures in key regions using gradient features.Furthermore, we enhance visual realism in key regions through adversarial training on both these regions and their gradient maps.Experimental results demonstrate that GReg achieves state-of-the-art performance with superior structural accuracy and temporal consistency.

Chat is not available.