Skip to yearly menu bar Skip to main content


Poster

PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic Manipulation

Zhihao ZHU · Yifan Zheng · Siyu Pan · Yaohui Jin · Yao Mu


Abstract:

The fragmentation between high-level task semantics and low-level geometric features remains a persistent critical challenge in robotic manipulation. While vision-language models (VLMs) have demonstrated their potential in generating affordance-aware visual representations, the lack of semantic grounding in canonical spaces and reliance on manually annotated severely limit their ability to capture dynamic semantic-affordance relationships. To address these limitations, we propose Primitive-Aware Semantic Grounding (PASG), a closed-loop framework that introduces: (1) Automatic primitive extraction through geometric feature aggregation, enabling cross-category detection of keypoints and axes; (2) VLM-driven semantic anchoring that dynamically couples geometric primitives with functional affordances and task-relevant description; (3) A spatial-semantic reasoning benchmark and a fine-tuned VLM (Qwen2.5VL-PA). Extensive experiments demonstrate PASG achieves a finer-grained semantic-affordance understanding of objects, establishing a unified paradigm for bridging geometric primitives with task semantics in robotic manipulation.

Live content is unavailable. Log in and register to view live content