Skip to yearly menu bar Skip to main content


Poster

STDDNet: Harnessing Mamba for Video Polyp Segmentation via Spatial-aligned Temporal Modeling and Discriminative Dynamic Representation Learning

Guilian Chen · Huisi Wu · Jing Qin


Abstract:

Automatic segmentation of polyps from colonoscopy videos is of great clinical significance as it can assist clinicians in making more accurate diagnoses and precise interventions. However, video polyp segmentation (VPS) poses significant challenges due to ambiguous boundaries between polyps and surrounding mucosae tissues, as well as variations in polyp scale, contrast, and position across consecutive frames. Moreover, to meet clinical requirements, the inference process must operate in real-time to enable intraoperative tracking and guidance. In this paper, we propose a novel and efficient segmentation network, STDDNet, which integrates a spatial-aligned temporal modeling strategy and a discriminative dynamic representation learning mechanism, to comprehensively address these challenges by harnessing the advantages of mamba. Specifically, a spatial-aligned temporal dependency propagation (STDP) module is developed to model temporal consistency from the consecutive frames based on a bidirectional scanning mamba block. Furthermore, we design a discriminative dynamic feature extraction (DDFE) module to explore frame-wise dynamic information from the structural feature generated by the mamba block. Such dynamic features can effectively deal with the variations across colonoscopy frames, providing more details for refined segmentation. We extensively evaluate STDDNet on two benchmark datasets, SUN-SEG and CVC-ClinicDB, demonstrating superior segmentation performance of our method over state-of-the-art methods while maintaining real-time inference. Codes will be released upon publication.

Live content is unavailable. Log in and register to view live content