ICCV Poster Probabilistic Prototype Calibration of Vision-language Models for Generalized Few-shot Semantic Segmentation

Poster

Probabilistic Prototype Calibration of Vision-language Models for Generalized Few-shot Semantic Segmentation

Jie Liu · Jiayi Shen · Pan Zhou · Jan-Jakob Sonke · Stratis Gavves

[ Abstract ]

Abstract: Generalized Few-Shot Semantic Segmentation (GFSS) aims to extend a segmentation model to novel classes with only a few annotated examples while maintaining performance on base classes. Recently, pretrained vision-language models (VLMs) such as CLIP have been leveraged in GFSS to improve generalization on novel classes through multi-modal prototypes learning. However, existing prototype-based methods are inherently deterministic, limiting the adaptability of learned prototypes to diverse samples, particularly for novel classes with scarce annotations. To address this, our work propose Probabilistic Prototype Calibration Network (PPCN) - a probabilistic modeling framework over multi-modal prototypes from the pretrained CLIP, thus providing more adaptive prototype learning for GFSS. Specifically, PPCN first introduces a prototype calibration mechanism, which refines frozen textual prototypes with learnable visual calibration prototypes, leading to a more discriminative and adaptive representation. Furthermore, unlike deterministic prototype learning techniques, PPCN introduces distribution regularization over these calibration prototypes. This probabilistic formulation ensures structured and uncertainty-aware prototype learning, effectively mitigating overfitting to limited novel class data while enhancing generalization. Extensive experimental results on PASCAL-5$^i$ and COCO-20$^i$ datasets demonstrate that our proposed PPCN significantly outperforms state-of-the-art approaches across both GFSS and class-incremental setting. The source code will be released publicly.

Live content is unavailable. Log in and register to view live content