ICCV Poster MambaML: Exploring State Space Models for Multi-Label Image Classification

Poster

MambaML: Exploring State Space Models for Multi-Label Image Classification

Xuelin Zhu · Jian liu · Jiuxin Cao · Bing WANG

Exhibit Hall I #445

[ Abstract ]

Tue 21 Oct 2:45 p.m. PDT — 4:45 p.m. PDT

Abstract:

Mamba, a selective state-space model, has recently been widely applied to various visual tasks due to its powerful capability to capture long-range dependencies. Although promising performance has been achieved on image classification, the effectiveness of Mamba on multi-label image classification has not been explored yet. In this work, we develop a novel MambaML framework for multi-label image classification, which incorporates a Mamba-based decoder to aggregate visual information from image features into label embeddings, yielding label-specific visual representations for classification. Building upon this, MambaML further employ Mamba to model both image feature sequence and label embedding sequence. In this way, MambaML is capable of exploring the spatial relationships of image features, semantic dependencies between label embeddings, as well as their cross-correlations, thereby resulting in robust label-specific visual representations and training binary classifiers for high-performance multi-label image classification. Extensive experimental results demonstrate that our MambaML achieves state-of-the-art performance on multiple benchmarks in multi-label image classification task.

Live content is unavailable. Log in and register to view live content