Poster
Overcoming Dual Drift for Continual Long-Tailed Visual Question Answering
Feifei Zhang · Zhihao Wang · Xi Zhang · Changsheng Xu
Visual Question Answering (VQA) is a widely explored multimodal task aimed at answering questions based on images. Recently, a few studies have started to investigate continual learning in VQA to cope with evolving multimodal data streams. However, these studies fall short of tackling another critical issue in real-world VQA applications: the long-tailed distribution of data. In this paper, we introduce Continual Long-Tailed Visual Question Answering (CLT-VQA) and identify two critical challenges: \textbf{inner-task prototype drift}, where classifier prototypes become biased toward majority classes due to imbalanced data, and \textbf{inter-task feature drift}, where learned features shift over time, causing forgetting of previously learned knowledge. To address these challenges, we propose a unified dual-balance approach that integrates a Balanced Classifier Prototype (BCP) learning module and a Multi-modal Feature Alignment (MFA) module. The BCP optimizes classifier prototypes to achieve balanced class representation, while the MFA aligns features consistently across tasks, preventing catastrophic forgetting. Extensive experimental results demonstrate that our method outperforms existing models, validating the effectiveness of the proposed approach. \textcolor{raspberry}{Code is available in the supplementary materials.}
Live content is unavailable. Log in and register to view live content