Poster
Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
Jiahua Dong · Hui Yin · Wenqi Liang · Hanbin Zhao · Henghui Ding · Nicu Sebe · Salman Khan · Fahad Khan
Video instance segmentation (VIS) has gained significant attention for its capability in segmenting and tracking object instances across video frames. However, most of the existing VIS methods unrealistically assume that the categories of object instances remain fixed over time. Moreover, they experience catastrophic forgetting of old classes when required to continuously learn object instances belonging to new classes. To address the above challenges, we develop a novel Hierarchical Visual Prompt Learning (HVPL) model, which alleviates catastrophic forgetting of old classes from both frame-level and video-level perspectives. Specifically, to mitigate forgetting at the frame level, we devise a task-specific frame prompt and an orthogonal gradient correction (OGC) module. The OGC module helps the frame prompt encode task-specific global instance information for new classes in each individual frame by projecting its gradients onto the orthogonal feature space of old classes. Furthermore, to address forgetting at the video level, we design a task-specific video prompt and a video context decoder. This decoder first embeds structural inter-class relationships across frames into the frame prompt feature, and then propagates task-specific global video contexts from the frame prompt features to the video prompt. Experiments verify the effectiveness of our HVPL model compared to other methods.
Live content is unavailable. Log in and register to view live content