ICCV Poster SA-MAE: A Sensor-Agnostic Masked Autoencoder for Remote Sensing Image Representation Learning

Poster

SA-MAE: A Sensor-Agnostic Masked Autoencoder for Remote Sensing Image Representation Learning

Gencer Sumbul · Chang Xu · Emanuele Dalsasso · Devis Tuia

[ Abstract ]

Abstract:

From optical sensors to microwave radars, leveraging the complementary strengths of remote sensing (RS) sensors is of great importance for achieving dense spatio-temporal monitoring of our planet. In contrast, recent deep learning models—task-specific or foundational—are often specific to single sensors or to fixed combinations: adapting such models to different sensory inputs requires both architectural changes and re-training, limiting scalability and generalization across multiple RS sensors. On the contrary, a single model able to modulate its feature representations to accept diverse sensors as input would pave the way to agile and flexible multi-sensor RS data processing. To address this, we introduce SA-MAE, a generic and versatile foundation model lifting sensor-specific/dependent efforts and enabling scalability and generalization to diverse RS sensors: SA-MAE projects data from heterogeneous sensors into a shared spectrum-aware space, enabling the usage of arbitrary combinations of bands—a key discriminative property for RS—both for training and inference. To obtain sensor-agnostic representations, we train a single, unified transformer model reconstructing masked multi-sensor data with cross-sensor token mixup. On both single- and multi-modal tasks across diverse sensors, SA-MAE outperforms previous models that rely on sensor-specific pretraining.

Live content is unavailable. Log in and register to view live content