ICCV Poster OV3D-CG: Open-vocabulary 3D Instance Segmentation with Contextual Guidance

Poster

OV3D-CG: Open-vocabulary 3D Instance Segmentation with Contextual Guidance

Mingquan Zhou · Chen He · Ruiping Wang · Xilin Chen

Exhibit Hall I #488

[ Abstract ] [ Project Page ]

Tue 21 Oct 6:15 p.m. PDT — 8:15 p.m. PDT

Abstract:

Open-vocabulary 3D instance segmentation (OV-3DIS), which aims to segment and classify objects beyond predefined categories, is a critical capability for embodied AI applications. Existing methods rely on pre-trained 2D foundation models, focusing on instance-level features while overlooking contextual relationships, limiting their ability to generalize to rare or ambiguous objects. To address these limitations, we propose an OV-3DIS framework guided by contextual information. First, we employ a Class-agnostic Proposal Module, integrating a pre-trained 3D segmentation model with a SAM-guided segmenter to extract robust 3D instance masks. Subsequently, we design a Semantic Reasoning Module, which selects the best viewpoint for each instance and constructs three 2D context-aware representations. The representations are processed using Multimodal Large Language Models with Chain-of-Thought prompting to enhance semantic inference. Notably, our method outperforms state-of-the-art methods on the ScanNet200 and Replica datasets, demonstrating superior open-vocabulary segmentation capabilities. Moreover, preliminary implementation in real-world scenarios verifies our method's robustness and accuracy, highlighting its potential for embodied AI tasks such as object-driven navigation.

Live content is unavailable. Log in and register to view live content