Poster
Target Bias Is All You Need: Zero-Shot Debiasing of Vision-Language Models with Bias Corpus
Taeuk Jang · Hoin Jung · Xiaoqian Wang
Vision-Language Models (VLMs) like CLIP have shown remarkable zero-shot performance by aligning different modalities in the embedding space, enabling diverse applications from image editing to visual question answering (VQA). However, these models often inherit biases from their training data, resulting in performance disparities across specific subpopulations. Traditional debiasing methods for VLMs primarily focus on specific downstream tasks using labeled datasets, which we argue is insufficient given the broad applicability of VLMs. Specifically, these methods struggle with generalizability, transferability, and feasibility due to overfitting, limited task applicability, and regulatory constraints on the use of sensitive data, making them less practical in real-world scenarios. To address these challenges, we propose a novel task-agnostic method for learning debiased image embeddings in VLMs. Our approach does not require expensive annotated datasets or curated prompts for downstream tasks, while still preserving the inherent zero-shot capabilities of these models. Instead, we leverage easily accessible information: 1) a bias text corpus generated by a large language model, and 2) a generic unsupervised vision dataset. Our method disentangles the image embedding into bias and neutral components by applying centered kernel alignment (CKA) regularization to the text-vision representational similarity, using the bias text corpus over the generic vision dataset. Experimental results validate the effectiveness of our approach across multiple tasks, offering a practical and versatile solution to debiasing VLMs.
Live content is unavailable. Log in and register to view live content