ICCV Poster Unknown Text Learning for CLIP-based Few-Shot Open-set Recognition

Poster

Unknown Text Learning for CLIP-based Few-Shot Open-set Recognition

Rui Ma · Qilong Wang · Bing Cao · Qinghua Hu · Yahong Han

[ Abstract ]

Abstract: Recently, vision-language models (e.g., CLIP) with prompt learning have shown great potential in few-shot learning. However, an open issue remains for the effective extension of CLIP-based models to few-shot open-set recognition (FSOR), which requires classifying known classes and detecting unknown samples using a few known samples. The core challenge is that unknown samples and their textual descriptions are unavailable. To address this, we propose an Unknown Text Learning (UTL) method for CLIP-based FSOR tasks with only known samples. Specifically, UTL involves two key components, i.e., universal unknown words optimization (U$^{2}$WO) and unknown label smoothing (ULS). Specifically, U$^{2}$WO constructs the universal space of unknown words with basis vectors and characterizes unknown text based on a linear combination of those basis vectors. To efficiently learn unknown text without unknown samples, ULS is presented to perform contrast learning between unknown text and known samples by regulating the label of unknown classes to a small constant, which flexibly empowers unknown text to be non-matching with and confused on known visual samples. In addition, our UTL incorporates an additional context for known classes to mitigate conflicts of context optimization between known and unknown classes. UTL effectively regularizes the predicted probability by integrating learnable unknown text. Experimental results on various benchmarks show that our UTL is superior to its counterparts while achieving state-of-the-art performance.

Live content is unavailable. Log in and register to view live content