Skip to yearly menu bar Skip to main content


Poster

CAT: A Unified Click-and-Track Framework for Realistic Tracking

Yongsheng Yuan · Jie Zhao · Dong Wang · Huchuan Lu


Abstract:

Modern visual trackers have achieved robust performance with precisely initialized target bounding boxes. However, providing high-precision initial annotations is a process both labor-intensive and error-prone in real-world scenarios. Interactive initialization (e.g., click-based, scribble-based) presents a more practical alternative. In this paper, we introduce a unified Click-and-Track (CAT) framework for full-process tracking, eliminating the need for auxiliary models or complex initializing pipelines. We present a novel fine-tuning paradigm that bridges the information gap inherent in click-based initialization through two key innovations: 1) The proposed click-based location and joint spatial-visual prompt refinement are sequentially performed to remedy the geometric information loss (e.g., boundary ambiguity, shape uncertainty) inherent in click-based initialization. 2) We design a parameter-efficient module called CTMoE to leverages the tracker's inherent capabilities when fine-tuning. The proposed CTMoE enable the foundation model to learn different matching patterns, unifying click-based initialization and tracking within a unified architecture. Extensive experimental results demonstrate state-of-the-art performance of our click-based tracking method on the LaSOT benchmark (70.5\% AUC) while maintaining parameter efficiency, surpassing existing click-based tracking frameworks by a large margin and even outperforming some bounding-box-initialized trackers.

Live content is unavailable. Log in and register to view live content