Poster
Seeing the Unseen: A Semantic Alignment and Context-Aware Prompt Framework for Open-Vocabulary Camouflaged Object Segmentation
Peng Ren · Tian Bai · Jing Sun · Fuming Sun
Open-Vocabulary Camouflaged Object Segmentation (OVCOS) aims to segment camouflaged objects of any category based on text descriptions. Despite existing open-vocabulary methods exhibit strong segmentation capabilities, they still have a major limitation in camouflaged scenarios: semantic confusion, which leads to incomplete segmentation and class shift in the model. To mitigate the above limitation, we propose a framework for OVCOS, named SuCLIP. Specifically, we design a context-aware prompt scheme that leverages the internal knowledge of the CLIP visual encoder to enrich the text prompt and align it with local visual features, thereby enhancing the text prompt. To better align the visual semantic space and the text semantic space, we design a class-aware feature selection module to dynamically adjust text and visual embeddings, making them more matched with camouflaged object. Meanwhile, we introduce a semantic consistency loss to mitigate the semantic deviation between the text prompt and visual features, ensuring semantic consistency between the segmentation results and the text prompt. Finally, we design a text query decoder that precisely maps textual semantics to pixel-level segmentation results, thereby achieving semantic-spatial consistent decoding. Experimental results show that SuCLIP significantly outperforms the advanced method OVCoser on the OVCamo dataset.
Live content is unavailable. Log in and register to view live content