Poster
AFFECT: Aligning Fisheye Feature Embeddings using Calibration Tokens for Monocular Depth Estimation
Suchisrit Gangopadhyay · Jung Hee Kim · Xien Chen · Patrick Rim · Hyoungseob Park · Alex Wong
Monocular depth estimation (MDE) has advanced significantly with the introduction of transformer-based foundational vision models. However, their application to fisheye images, widely used in robotics, security systems, autonomous vehicles, and augmented reality due to their wide field of view, remains challenging due to severe radial distortions and calibration differences. Standard transformer-based models trained on perspective images fail to generalize effectively to fisheye inputs, resulting in poor depth predictions. To address this, we introduce \emph{calibration tokens}, a lightweight, token-based adaptation method that allows perspective-trained foundational models to handle fisheye distortions without retraining or fine-tuning the entire network. Calibration tokens learn to realign distorted fisheye features with the perspective latent distribution in a self-supervised manner using a novel inverse warping consistency loss. This training approach leverages existing perspective image datasets and pre-trained foundational models without requiring labeled fisheye images. Experiments demonstrate that our calibration tokens improve performance on real-world fisheye datasets for monocular depth estimation tasks, surpassing baselines while maintaining computational efficiency and inference-time simplicity.
Live content is unavailable. Log in and register to view live content