ICCV Poster T2Bs: Text-to-Character Blendshapes via Video Generation

Poster

T2Bs: Text-to-Character Blendshapes via Video Generation

Jiahao Luo · Chaoyang Wang · Michael Vasilkovsky · Vladislav Shakhrai · Di Liu · Peiye Zhuang · Sergey Tulyakov · Peter Wonka · Hsin-Ying Lee · James Davis · Jian Wang

Exhibit Hall I #1256

[ Abstract ]

Wed 22 Oct 2:15 p.m. PDT — 4:15 p.m. PDT

Abstract:

We propose a new framework to create high-quality character head morphable models from text, combining static text-to-3D generation with video diffusion. Bridging the gap between these two methods is challenging: text-to-3D models produce detailed static geometry but cannot synthesize motion, while video diffusion models generate motion but face consistency issues like varying colors, varying viewpoints, or geometric distortion. Our solution uses deformable 3D Gaussian splatting to align static 3D models with video diffusion outputs, enabling the creation of a set of diverse, expressive motions with greater accuracy. By incorporating static geometry as a constraint and using a view-dependent deformation MLP, we reduce video artifacts and produce coherent, consistent results. This approach allows us to build a 3D morphable model that can generate new, realistic expressions. Compared to existing 4D generation techniques, our method achieves superior results and creates expressive character head models that can be animated.

Live content is unavailable. Log in and register to view live content