Official repository for the paper
RT2I-Bench: Evaluating Robustness of Text-to-Image Systems Against Adversarial Attacks
Athanasios Glentis∗, Ioannis Tsaknakis∗, Jiangweizhi Peng, Xun Xian, Yihua Zhang, Gaowen Liu, Charles Fleming, Mingyi Hong
Transactions on Machine Learning Research (TMLR), 2026.
*Equal contribution.
Text-to-Image (T2I) systems have demonstrated impressive abilities in the generation of images from text descriptions. However, these systems remain susceptible to adversarial prompts—carefully crafted input manipulations that can result in misaligned or even toxic outputs. This vulnerability highlights the need for systematic evaluation of attack strategies that exploit these weaknesses, as well as for testing the robustness of T2I systems against them. To this end, this work introduces the RT2I-Bench benchmark. RT2I-Bench serves two primary purposes. First, it provides a structured evaluation of various adversarial attacks, examining their effectiveness, transferability, stealthiness and potential for generating misaligned or toxic outputs, as well as assessing the resilience of state-of-the-art T2I models to such attacks. We observe that state-of-the-art T2I systems are vulnerable to adversarial prompts, with the most effective attacks achieving success rates of over 60% across the majority of T2I models we tested. Second, RT2I-Benchenables the creation of a set of strong adversarial prompts (consisting of 1,439 that induce misaligned or targeted outputs and 173 that induce toxic outputs), which are effective across a wide range of systems. Finally, our benchmark is designed to be extensible, enabling the seamless addition of new attacks, T2I models, and evaluation metrics. This framework provides an automated solution for robustness assessment and adversarial prompt generation in T2I systems.
CAUTION: This paper contains AI-generated images that may be considered offensive or inappropriate. This repository contains code that can result in the generation of prompts and images that may be considered offensive or inappropriate
-
Check
packages.txtfor the required packages. -
Specify the parameters of the individual components (datasets, attacks, T2I models, evaluation models) in the corresponding
.yamlfiles inResults/configs -
Specify the experiment parameters (selection of attacks, models, etc. to use) in
Scripts/run_experiments.sh -
Run the experiments with:
cd Scripts
sh run_experiments.shThis generates a parquet file in Results/ containing a table with the detailed results. These results need to be processed next in order to generate the final statistics (in the terminal output) and the datasets of adversarial prompts.
cd Scripts
sh process_results.sh
sh compute_stats.shNote that for the above scripts to work, you need to set the correct paths and parameter values in their corresponding python programs, i.e., Results/process_results.py and Results/compute_stats.py. There are comments in these files indicating where to set these parameters.
Below are the components of RT2I-Bench. We provide links to the paper and code repositories of the datasets, attacks, models, and evaluation measures used in our benchmark.
Datasets
Attacks
- QF-Attack [paper, code]
- MMP-Attack [paper, code]
- SDTargeted [paper, code]
- AsymmetricAttack [paper, code]
- TuRBO [paper, code]
- Ring-A-Bell [paper, code]
- MMA-Diffusion [paper, code]
- Typos (addition)
- Typos (swap)
- Typos (substitution)
Models
- Stable Diffusion v1.3 [code]
- Stable Diffusion v1.4 [code]
- Stable Diffusion v1.5 [code]
- Stable Diffusion v2.1 [code]
- Stable Diffusion XL [code]
- DALL·E Mini [code]
- Hunyuan-DiT [code]
- Safe Latent Diffusion [code]
- SafeGen [code]
Evaluation Measures
