Workshop 313 B

Workshop on Benchmarking Multi-Target Tracking: Towards Spatiotemporal Action Grounding in Videos

Tanveer Hannan, Shuaicong Wu, Mark Weber, Suprosanna Shit, Rajat Koner, Jindong Gu, Aljosa Osep, Prof. Dr. Thomas Seidl, Prof. Dr. Laura Leal-Taixé

Project Page

Abstract

The 8th BMTT Workshop focuses on action-aware multi-object tracking, aiming to unify temporal action localization and object tracking through natural language queries. While existing benchmarks often address these tasks separately, this workshop presents unified challenges to evaluate both capabilities. Participants are encouraged to develop models that can understand complex actions, follow detailed language instructions, and track multiple objects across time. The workshop aims to close the gap between vision and language, advancing multimodal video understanding and supporting research on scalable, real-world systems capable of fine-grained, action-driven reasoning in dynamic scenes.

Video

Chat is not available.