Skip to yearly menu bar Skip to main content


Poster

GT-Loc: Unifying When and Where in Images through a Joint Embedding Space

David G. Shatwell · Ishan Rajendrakumar Dave · Swetha Sirnam · Mubarak Shah


Abstract:

Timestamp prediction aims to determine when an image was captured using only visual information, supporting applications such as metadata correction, retrieval, and digital forensics. In outdoor scenarios, hourly estimates rely on cues like brightness, hue, and shadow positioning, while seasonal changes and weather inform date estimation. However, these visual cues significantly depend on geographic context, closely linking timestamp prediction to geo-localization. To address this interdependence, we introduce GT-Loc, a novel retrieval-based method that jointly predicts the capture time (hour and month) and geo-location (GPS coordinates) of an image. Our approach employs separate encoders for images, time, and location, aligning their embeddings within a shared high-dimensional feature space. Recognizing the cyclical nature of time, we utilize Random Fourier Features for effective temporal representation. Instead of conventional contrastive learning with hard positives and negatives, we propose a metric-learning objective providing soft targets by modeling temporal differences over a cyclical toroidal surface. We present new benchmarks demonstrating that our joint optimization surpasses methods focused solely on time prediction and even those utilizing geo-location during inference. Additionally, our approach achieves competitive results on standard geo-localization tasks, while the unified embedding space facilitates compositional and text-based image retrieval.

Live content is unavailable. Log in and register to view live content