Skip to yearly menu bar Skip to main content


Poster

Mitigating Catastrophic Overfitting in Fast Adversarial Training via Label Information Elimination

Chao Pan · Ke Tang · Li Qing · Xin Yao


Abstract:

Fast Adversarial Training (FAT) employs the single-step Fast Gradient Sign Method (FGSM) to generate adversarial examples, reducing the computational costs of traditional adversarial training. However, FAT suffers from Catastrophic Overfitting (CO), where models' robust accuracy against multi-step attacks plummets to zero during training. Recent studies indicate that CO occurs because single-step adversarial perturbations contain label information that models exploit for prediction, leading to overfitting and diminished robustness against more complex attacks. In this paper, we discover that after CO occurs, the label information of certain samples can transfer across different samples, significantly increasing the likelihood of modified images being classified as the intended label. This discovery offers a new perspective on why various adversarial initialization strategies are effective. To address this issue, we introduce an innovative FAT strategy that leverages special samples to capture transferable label information and proactively removes potential label information during training, complemented by a non-uniform label smoothing technique to further eliminate label information. Experimental results across three datasets demonstrate that our method maintains competitive robustness against several attacks compared to other FAT approaches, with ablation studies confirming the effectiveness of our methodology.

Live content is unavailable. Log in and register to view live content