AdvDetectGPT: Detecting Adversarial Examples Using Large Vision-Language Models

Authors: Ming Zhang, Huayang Cao, and Cheng Qian
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 802-812
Keywords: Adversarial detection, Adversarial examples, Deep neural networks, LVLMs.

Abstract

Adversarial examples have been proven to be a substantial threat to the security applications of deep neural networks. Adversarial detection plays a pivotal role in defending against adversarial attacks. While the underlying concept is straightforward, the practical realization of adversarial detection is non-trivial, frequently encountering challenges of universality and effectiveness. In this study, we leverage the powerful capabilities of large vision-language models LVLMs and develop AdvDetectGPT, a novel adversarial detector based on LVLMs. AdvDetectGPT can learn to identify adversarial examples directly from clean and adversarial instances, independent of the victim model's outputs or internal responses. The extensive experiments show that AdvDetectGPT significantly outperforms the state-of-the-art baselines. AdvDetectGPT exhibits robust generalization, capable of detecting adversarial examples crafted by novel attacks on new models, as well as those with customized perturbations distinct from the training set. Code is available at https: github.com mingcheung AdvDetectGPT.
📄 View Full Paper (PDF) 📋 Show Citation