ICCV Poster Motal: Unsupervised 3D Object Detection by Modality and Task-specific Knowledge Transfer

Poster

Motal: Unsupervised 3D Object Detection by Modality and Task-specific Knowledge Transfer

Hai Wu · Hongwei Lin · Xusheng Guo · Xin Li · Mingming Wang · Cheng Wang · Chenglu Wen

[ Abstract ]

Abstract:

The performance of unsupervised 3D object classification and bounding box regression relies heavily on the quality of initial pseudo-labels. Traditionally, the labels of classification and regression are represented by \textbf{a single set} of candidate boxes generated by motion or geometry heuristics. However, due to the similarity of many objects to the background in shape or lack of motion, the labels often fail to achieve high accuracy in two tasks simultaneously. Using these labels to directly train the network results in decreased detection performance. To address this challenge, we introduce Motal that performs unsupervised 3D object detection by Modality and task-specific knowledge transfer. Motal decouples the pseudo-labels into two sets of candidates, from which Motal discovers classification knowledge by motion and image appearance prior, and discovers box regression knowledge by geometry prior, respectively. Motal finally transfers all knowledge to a single student network by a TMT (Task-specific Masked Training) scheme, attaining high performance in both classification and regression. Motal can greatly enhance various unsupervised methods by about 2x mAP. For example, on the WOD test set, Motal improves the state-of-the-art CPD by 21.56% mAP L1 (from 20.54% to 42.10%) and 19.90% mAP L2 (from 18.18% to 38.08%). These achievements highlight the significance of our method. The code will be made publicly available.

Live content is unavailable. Log in and register to view live content