Poster
O-MaMa: Learning Object Mask Matching between Egocentric and Exocentric Views
Lorenzo Mur-Labadia · Maria Santos-Villafranca · Jesus Bermudez-cameo · Alejandro Perez-Yus · Ruben Martinez-Cantin · Jose Guerrero
Understanding the world from multiple perspectives is essential for intelligent systems operating together, where segmenting common objects across different views remains an open problem.We introduce a new approach that re-defines cross-image segmentation by treating it as a mask matching task.Our method consists of: (1) A Mask-Context Encoder that pools dense DINOv2 semantic features to obtain discriminative object-level representations from FastSAM mask candidates, (2) a Ego↔Exo Cross-Attention that fuses multi-perspective observations, (3) a Mask Matching contrastive loss that aligns cross-view features in a shared latent space and,(4) a Hard Negative Adjacent Mining strategy to encourage the model to better differentiate between nearby objects.O-MaMa achieves the state of the art in the Ego-Exo4D Correspondences benchmark, obtaining relative gains of +31% and 94% in the Ego2Exo and Exo2Ego IoU against the official challenge baselines and a +13% and +6% compared with the SOTA with 1% of the training parameters.
Live content is unavailable. Log in and register to view live content