Skip to yearly menu bar Skip to main content


Poster

CVPT: Cross Visual Prompt Tuning

Lingyun Huang · Jianxu Mao · Junfei YI · Ziming Tao · Yaonan Wang


Abstract:

In recent years, the rapid expansion of model sizes has introduced huge computational overhead. To address these issues, Parameter-Efficient Fine-Tuning (PEFT) methods have been introduced. This method optimizes large-scale pre-trained models for specific tasks by fine-tuning a select group of parameters. Among these PEFT methods, adapter-based and prompt-based methods are the primary techniques. Specifically, in the field of visual fine-tuning, adapters gain prominence over prompts because of the latter’s relatively weaker performance and efficiency. Under the circumstances, we conducted a detailed analysis of Visual Prompt Tuning (VPT) and attributed its shortcomings to the deployment of prompts in VPT. Consequently, we proposed Cross Visual Prompt Tuning (CVPT), which introduces cross-attention to directly capture the relationships between prompts and the original tokens, allowing the prompts to integrate visual features efficiently. This changes the original deployment of prompts, thereby decoupling the prompts from the original tokens and avoiding the distortion of self-attention. Furthermore, we introduce the weight-sharing mechanism to initialize the parameters of cross-attention, which avoids massive learnable parameters from cross-attention and enhances the representative capability of cross-attention. We conduct comprehensive testing across 25 datasets and the result indicates that CVPT significantly improves VPT’s performance and efficiency in visual tasks. For example, on the VTAB-1K benchmark, CVPT outperforms VPT by over 4\% in average accuracy, rivaling the advanced adapter-based methods in performance and efficiency. Our experiments confirm that prompt-based methods can achieve exceptional results in visual fine-tuning. The code is available at https://anonymous.4open.science/r/CVPT-A873/readme.md

Live content is unavailable. Log in and register to view live content