ICCV Poster TrackVerse: A Large-Scale Object-Centric Video Dataset for Image-Level Representation Learning

Poster

TrackVerse: A Large-Scale Object-Centric Video Dataset for Image-Level Representation Learning

Yibing Wei · Samuel Church · Victor Suciu · Jinhong Lin · Cheng-En Wu · Pedro Morgado

Exhibit Hall I #1023

[ Abstract ]

Wed 22 Oct 2:15 p.m. PDT — 4:15 p.m. PDT

Abstract:

Video data inherently captures rich, dynamic contexts that reveal objects in varying poses, interactions, and state transitions, offering rich potential for unsupervised visual representation learning.However, existing natural video datasets are not well-suited for effective object representation learning due to their lack of object-centricity and class diversity. To address these challenges, we introduce TrackVerse, a novel large-scale video dataset for learning object representations. TrackVerse features diverse, common objects tracked over time, capturing their evolving states. To leverage temporal dynamics in TrackVerse, we extend contrastive learning with a variance-aware predictor that conditions on data augmentations, enabling models to learn state-aware representations.Extensive experiments demonstrate that representations learned from TrackVerse with variance-aware contrastive learning significantly outperform those from non-object-centric natural video and static image datasets across multiple downstream tasks including object/attributie recognition, action recognition and video instance segmentation, highlighting the rich semantic and state content in TrackVerse feature.

Live content is unavailable. Log in and register to view live content