Skip to yearly menu bar Skip to main content


Poster

ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers

Hanwen Cao · Haobo Lu · Xiaosen Wang · Kun He


Abstract:

Ensemble-based attacks have been proven to be effective in enhancing adversarial transferability by aggregating the output of models with various architectures. However, existing research primarily focuses on refining ensemble weights or optimizing the ensemble path, overlooking the exploration of ensemble models to enhance the transferability of adversarial attacks. In this study, we attempt to adversarially augment ensemble models by modifying inner modules to mitigate this gap. Moreover, observing that ensemble Vision Transformers (ViTs) gain less attention, we propose ViT-EnsembleAttack, the first ensemble-based attack method tailored for ViTs to the best of our knowledge. Our approach generates augmented models for each surrogate ViT using three strategies: Multi-head dropping, Attention score scaling and MLP feature mixing, with the associated parameters optimized by Bayesian optimization. These adversarially augmented models are ensembled to generate adversarial examples. Furthermore, we introduce an automatic reweighting module that dynamically adjusts the influence of each surrogate model in the ensemble, while also enlarging the step size in each iteration to enhance convergence. Extensive experiments demonstrate that ViT-EnsembleAttack significantly enhances the adversarial transferability of ensemble-based attacks on ViTs, outperforming existing methods by a substantial margin.

Live content is unavailable. Log in and register to view live content