Abstract
Despite its significant benefits in enhancing the transparency and trustworthiness of artificial intelligence (AI) systems, explainable AI (XAI) can unintentionally provide adversaries with insights into blackbox models, increasing their vulnerability to various attacks. In this paper, we develop a novel explanation-driven adversarial attack against blackbox classifiers based on feature substitution, called XSub. The key idea of XSub is to strategically replace important features (identified via XAI) in the original sample with corresponding important features of a different label, thereby increasing the likelihood of the model misclassifying the perturbed sample. XSub only requires a minimal number of queries and can be easily extended to launch backdoor attacks in case the attacker has access to the model's training data. Our evaluation shows that XSub is not only effective and stealthy but also low-cost, showcasing its feasibility across a wide range of AI applications.
| Original language | American English |
|---|---|
| Number of pages | 6 |
| DOIs | |
| State | Published - 2025 |
| Event | 2024 IEEE International Conference on Big Data (IEEE BigData 2024) - Washington, DC Duration: 15 Dec 2024 → 18 Dec 2024 |
Conference
| Conference | 2024 IEEE International Conference on Big Data (IEEE BigData 2024) |
|---|---|
| City | Washington, DC |
| Period | 15/12/24 → 18/12/24 |
NREL Publication Number
- NREL/CP-2C00-91278
Keywords
- adversarial attack
- adversarial machine learning
- backdoor attack
- explainable AI