Skip to yearly menu bar Skip to main content


Poster

Adversarial Training for Probabilistic Robustness

YI ZHANG · Yuhang Chen · Zhen Chen · Wenjie Ruan · Xiaowei Huang · Siddartha Khastgir · Xingyu Zhao


Abstract:

Deep learning (DL) has shown transformative potential across industries, yet its sensitivity to adversarial examples (AEs) limits its reliability and broader deployment. Research on DL robustness has developed various techniques, with adversarial training (AT) established as a leading approach to counter AEs. Traditional AT focuses on worst-case robustness (WCR), but recent work has introduced probabilistic robustness (PR), which evaluates the proportion of AEs within a local perturbation range, providing an overall assessment of the model's local robustness and acknowledging residual risks that are more practical to manage. However, existing AT methods are fundamentally designed to improve WCR, and no dedicated methods currently target PR. To bridge this gap, we reformulate a new min-max optimization as the theoretical foundation for AT focused on PR, and introduce an AT-PR training scheme with effective numerical algorithms to solve the new optimization problem. Our experiments, based on 38 DL models trained on common datasets and architectures, demonstrate that AT-PR achieves higher improvements in PR than AT-WCR methods and shows more consistent effectiveness across varying local inputs, with a smaller trade-off in model generalization. Open-source tools and all experiments are publicly accessible.

Live content is unavailable. Log in and register to view live content