ICCV Poster Enhancing Numerical Prediction of MLLMs with Soft Labeling

Poster

Enhancing Numerical Prediction of MLLMs with Soft Labeling

Pei Wang · Zhaowei Cai · Hao Yang · Davide Modolo · Ashwin Swaminathan

[ Abstract ]

Abstract:

The optimality of using the de facto cross-entropy loss with one-hot target distribution (hard labeling) is questioned when training (Multimodal) Large Language Models (LLMs/MLLMs). Although it is reasonable for language token prediction, which is a typical multi-class classification problem in discrete space, it is suboptimal for task like numerical prediction, which is a typical regression problem in continuous space. However, enabling regression in LLMs/MLLMs will complicate the training and next-token prediction paradigm at inference. Instead, to address this challenge, we propose a novel loss design, called soft labeling, which smooths the target probability distribution, enabling predictions to be penalized according to their distance to the target. This is similar to regression loss, which penalizes more on the further predictions in the continuous space, but will not change the model architecture and the next-token prediction paradigm of LLMs/MLLMs. We demonstrate the efficacy of soft labeling through extensive experiments on visual grounding, object counting, and chart understanding, achieving state-of-the-art performance on multiple benchmarks without bells and whistles. Soft labeling can be applied in any LLM/MLLM.

Live content is unavailable. Log in and register to view live content