Skip to yearly menu bar Skip to main content


Poster

Class Token as Proxy: Optimal Transport-assisted Proxy Learning for Weakly Supervised Semantic Segmentation

Jian Wang · Tianhong Dai · Bingfeng Zhang · Siyue Yu · ENG LIM · Jimin XIAO


Abstract:

Weakly Supervised Semantic Segmentation (WSSS) utilizes Class Activation Maps (CAMs) to extract spatial cues from image-level labels. However, CAMs highlight only the most discriminative foreground regions, leading to incomplete results. Recent Vision Transformer-based methods leverage class-patch attention to enhance CAMs, yet they still suffer from partial activation due to the token gap: classification-focused class tokens prioritize discriminative features, while patch tokens capture both discriminative and non-discriminative characteristics. This mismatch prevents class tokens from activating all relevant features, especially when discriminative and non-discriminative regions exhibit significant differences. To address this issue, we propose Optimal Transport-assisted Proxy Learning (OTPL), a novel framework that bridges the token gap by learning adaptive proxies. OTPL introduces two key strategies: (1) optimal transport-assisted proxy learning, which combines class tokens with their most relevant patch tokens to produce comprehensive CAMs, and (2) optimal transport-enhanced contrastive learning, aligning proxies with confident patch tokens for bounded proxy exploration. Our framework overcomes the limitation of class tokens in activating patch tokens, providing more complete and accurate CAM results. Experiments on WSSS benchmarks (PASCAL VOC and MS COCO) demonstrate that our method significantly improves the CAM quality and achieves state-of-the-art performances. The source code will be released.

Live content is unavailable. Log in and register to view live content