Poster Thu, Oct 23, 2025 • 2:15 PM – 4:15 PM PDT Exhibit Hall I #215

HyPiDecoder: Hybrid Pixel Decoder for Efficient Segmentation and Detection

Fengzhe Zhou · Humphrey Shi

Abstract

Recently, Mask2Former has achieved significant success as a universal image segmentation framework, with its Multi-Scale Deformable Attention (MSDeformAttn) Pixel Decoder becoming a widely adopted component in current segmentation models. However, the inefficiency of MSDeformAttn has become a performance bottleneck for segmenters. To address this, we propose the Hyper Pixel Decoder (HyPiDecoder), an improved Pixel Decoder design that replaces parts of the MSDeformAttn layers with convolution-based FPN layers, introducing explicit locality information and significantly boosting inference speed. Experimental results show that HyPiDecoder can be applied to both universal segmentation models and unified segmentation and detection models, achieving improvements in both speed and accuracy across object detection, semantic, instance, and panoptic segmentation tasks. The Mask DINO model integrated with HyPiDecoder achieves a new SOTA of 58.8 PQ on COCO panoptic segmentation with SwinL-scale backbone and no extra training data, with a 127\% increase in inference speed compared to the original model. Code will be released in the future.

Chat is not available.