Changjiang Li, Stony Brook University; Ren Pang, Bochuan Cao, Jinghui Chen, and Fenglong Ma, The Pennsylvania State University; Shouling Ji, Zhejiang University; Ting Wang, Stony Brook University
Thanks to their remarkable denoising capabilities, diffusion models are increasingly being employed as defensive tools to reinforce the robustness of other models, notably in purifying adversarial examples and certifying adversarial robustness. However, the potential risks of these practices remain largely unexplored, which is highly concerning. To bridge this gap, this work investigates the vulnerability of robustness-enhancing diffusion models.
Specifically, we demonstrate that these models are highly susceptible to DIFF2, a simple yet effective attack, which substantially diminishes their robustness assurance. Essentially, DIFF2 integrates a malicious diffusion-sampling process into the diffusion model, guiding inputs embedded with specific triggers toward an adversary-defined distribution while preserving the normal functionality for clean inputs. Our case studies on adversarial purification and robustness certification show that DIFF2 can significantly reduce both post-purification and certified accuracy across benchmark datasets and models, highlighting the potential risks of relying on pre-trained diffusion models as defensive tools. We further explore possible countermeasures, suggesting promising avenues for future research.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.