Yihan Ma, Xinyue Shen, and Yiting Qu, CISPA Helmholtz Center for Information Security; Ning Yu, Netflix Eyeline Studios; Michael Backes, CISPA Helmholtz Center for Information Security; Savvas Zannettou, Delft University of Technology; Yang Zhang, CISPA Helmholtz Center for Information Security
Open-source Vision Language Models (VLMs) have rapidly advanced, blending natural language with visual modalities, leading them to achieve remarkable performance on tasks such as image captioning and visual question answering. However, their effectiveness in real-world scenarios remains uncertain, as real-world images—particularly hateful memes—often convey complex semantics, cultural references, and emotional signals far beyond those in experimental datasets. In this paper, we present an in-depth evaluation of VLMs' ability to interpret hateful memes by curating a dataset of 39 hateful memes and 12,775 responses from seven representative VLMs using carefully designed prompts. Our manual annotations of the responses' informativeness and soundness reveal that VLMs can identify visual concepts and understand cultural and emotional backgrounds, especially for the well-known hateful memes. However, we find that the VLMs lack robust safeguards to effectively detect and reject hateful content, making them vulnerable to misuse for generating harmful outputs such as hate speech and offensive slogans. Our findings show that 40% of VLM-generated hate speech and over 10% of hateful jokes and slogans were flagged as harmful, emphasizing the urgent need for stronger safety measures and ethical guidelines to mitigate misuse. We hope our study serves as a foundation for improving VLM safety and ethical standards in handling hateful content.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.