Poster
HUST: High-Fidelity Unbiased Skin Tone Estimation via Texture Quantization
Zimin Ran · Xingyu Ren · Xiang An · Kaicheng Yang · Ziyong Feng · Jing Yang · Rolandos Alexandros Potamias · Linchao Zhu · Jiankang Deng
Recent 3D facial reconstruction methods have made significant progress in shape estimation, but high-fidelity unbiased facial albedo estimation remains challenging. Existing methods rely on expensive light-stage captured data, and while they have made progress in either high-fidelity reconstruction or unbiased skin tone estimation, no work has yet achieved optimal results in both aspects simultaneously. In this paper, we present a novel high-fidelity unbiased facial diffuse albedo reconstruction method, HUST, which recovers the diffuse albedo map directly from a single image without the need for captured data. Our key insight is that the albedo map is the illumination-invariant texture map, which enables us to use inexpensive texture data for diffuse albedo estimation by eliminating illumination. To achieve this, we collect large-scale high-resolution facial images and train a VQGAN model in the image space. To adapt the pre-trained VQGAN model for UV texture generation, we fine-tune the encoder by using limited UV textures and our high-resolution faces under adversarial supervision in both image and latent space. Finally, we train a cross-attention module and utilize group identity loss for the domain adaptation from texture to albedo. Extensive experiments demonstrate that HUST can predict high-fidelity facial albedos for in-the-wild images. On the FAIR benchmark, HUST achieves the lowest average ITA error (11.20) and bias score (1.58), demonstrating superior accuracy and robust fairness across the entire spectrum of human skin tones. Our code, models, and training data will be made publicly available to facilitate future research.
Live content is unavailable. Log in and register to view live content