ICCV Poster Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness

Poster

Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness

Qifan Yu · Zhebei Shen · Zhongqi Yue · Yang Wu · Bosheng Qin · Wenqiao Zhang · Yunfei Li · Juncheng Li · Siliang Tang · Yueting Zhuang

#5

[ Abstract ]

Tue 21 Oct 2:45 p.m. PDT — 4:45 p.m. PDT

Abstract:

Instruction tuning fine-tunes pre-trained Multi-modal Large Language Models (MLLMs) to handle real-world tasks. However, the rapid expansion of visual instruction datasets introduces data redundancy, leading to excessive computational costs. We propose a collaborative framework, DataTailor, which leverages three key principles—informativeness, uniqueness, and representativeness—for effective data selection. We argue that a valuable sample should be informative of the task, non-redundant, and represent the sample distribution (i.e., not an outlier). We further propose practical ways to score against each principle, which automatically adapts to a given dataset without tedious hyperparameter tuning. Comprehensive experiments on various benchmarks demonstrate that DataTailor achieves 101.3\% of the performance of full-data fine-tuning with only 15\% of the data, significantly reducing computational costs while maintaining superior results. This exemplifies the "Less is More" philosophy in MLLM development. The code is in https://anonymous.4open.science/r/DataTailor-5BC3.

Live content is unavailable. Log in and register to view live content