Poster
Boosting Multi-View Indoor 3D Object Detection via Adaptive 3D Volume Construction
Runmin Zhang · Zhu Yu · Si-Yuan Cao · Lingyu Zhu · Guangyi Zhang · Xiaokai Bai · Hui-liang Shen
This work presents SGCDet, a novel multi-view indoor object detection framework based on adaptive 3D volume construction. Unlike previous approaches that restrict the receptive field of voxels to fixed locations on images, we introduce a geometry and context aware aggregation module to integrate geometric and contextual information within an adaptive region, enhancing the representation capability of voxel features. Furthermore, we propose a sparse volume construction strategy that adaptively identifies and selects voxels with a high occupancy probability for feature refinement, minimizing redundant computation in free space. Benefiting from the above designs, our framework achieves effective and efficient volume construction in an adaptive way. Better still, our network can be supervised using only 3D bounding boxes, eliminating the dependence on ground-truth scene geometry. Experimental results demonstrate that SGCDet achieves state-of-the-art performance on the ScanNet and ARKitScenes datasets. Compared to the previous state-of-the-art approach, our SGCDet reduces training memory, training time, inference memory, and inference time by 42.9\%, 47.2\%, 50\%, and 40.8\%, respectively, while achieving notable improvements in mAP@0.50 of 3.9 on ScanNet and 3.3 on ARKitScenes.
Live content is unavailable. Log in and register to view live content