Poster
Towards Safer and Understandable Driver Intention Prediction
Mukilan Karuppasamy · Shankar Gangisetty · Shyam Nandan Rai · Carlo Masone · C.V. Jawahar
Autonomous driving (AD) systems are becoming increasinglycapable of handling complex tasks, largely due to recentadvances in deep learning and AI. As the interactions betweenautonomous systems and humans grow, the interpretabilityof driving system decision-making processes becomes crucialfor safe driving. Successful human-machine interactionrequires understanding the underlying representations of theenvironment and the driving task, which remains a significantchallenge in deep learning-based systems. To address this, weintroduce the task of interpretability in maneuver predictionbefore they occur for driver safety, i.e., driver intent prediction(DIP), which plays a critical role in AD systems. To fosterresearch in interpretable DIP, we curate the eXplainableDriving Action Anticipation Dataset (DAAD-X), a newmultimodal, ego-centric video dataset to provide hierarchical,high-level textual explanations as causal reasoning for thedriver’s decisions. These explanations are derived fromboth the driver’s eye-gaze and the ego-vehicle’s perspective.Next, we propose Video Concept Bottleneck Model (VCBM),a framework that generates spatio-temporally coherentexplanations inherently, without relying on post-hoc techniques.Finally, through extensive evaluations of the proposed VCBMon DAAD-X dataset, we demonstrate that transformer-basedmodels exhibit greater interpretability compared to conventionalCNN-based models. Additionally, we introduce a multilabelt-SNE visualization technique to illustrate the disentanglementand causal correlation among multiple explanations. Thedataset and code will be released on acceptance.
Live content is unavailable. Log in and register to view live content