ICCV Poster RCTDistill: Cross-Modal Knowledge Distillation Framework for Radar-Camera 3D Object Detection with Temporal Fusion

Poster

RCTDistill: Cross-Modal Knowledge Distillation Framework for Radar-Camera 3D Object Detection with Temporal Fusion

Geonho Bang · Minjae Seong · Jisong Kim · Geunju Baek · DayeOh DayeOh · Junhyung Kim · Junho Koh · Jun Won Choi

[ Abstract ]

Abstract:

Radar-camera fusion methods have emerged as a cost-effective approach for 3D object detection but still lag behind LiDAR-based methods in performance. Recent works have focused on employing temporal fusion and Knowledge Distillation (KD) strategies to overcome these limitations. However, existing approaches have not sufficiently accounted for uncertainties arising from object motion or sensor-specific errors inherent in radar and camera modalities. In this work, we propose RCTDistill, a novel cross-modal KD method based on temporal fusion, comprising three key modules: Range-Azimuth Knowledge Distillation (RAKD), Temporal Knowledge Distillation (TKD), and Region-Decoupled Knowledge Distillation (RDKD). RAKD is designed to consider the inherent errors in the range and azimuth directions, enabling effective knowledge transfer from LiDAR features to refine inaccurate BEV representations. TKD mitigates temporal misalignment caused by dynamic objects by aligning historical radar-camera BEV features with LiDAR representations. RDKD enhances feature discrimination by distilling relational knowledge from the teacher model, allowing the student to understand better and differentiate foreground and background features. RCTDistill achieves state-of-the-art radar–camera fusion performance on both the nuScenes and view-of-delft (VoD) datasets, with the fastest inference speed of 26.2 FPS.

Live content is unavailable. Log in and register to view live content