RT2I-Bench: Evaluating Robustness of Text-to-Image Systems Against Adversarial Attacks

Official repository for the paper

RT2I-Bench: Evaluating Robustness of Text-to-Image Systems Against Adversarial Attacks

Athanasios Glentis∗, Ioannis Tsaknakis∗, Jiangweizhi Peng, Xun Xian, Yihua Zhang, Gaowen Liu, Charles Fleming, Mingyi Hong 

Transactions on Machine Learning Research (TMLR), 2026.

*Equal contribution.

Abstract

Text-to-Image (T2I) systems have demonstrated impressive abilities in the generation of images from text descriptions. However, these systems remain susceptible to adversarial prompts—carefully crafted input manipulations that can result in misaligned or even toxic outputs. This vulnerability highlights the need for systematic evaluation of attack strategies that exploit these weaknesses, as well as for testing the robustness of T2I systems against them. To this end, this work introduces the RT2I-Bench benchmark. RT2I-Bench serves two primary purposes. First, it provides a structured evaluation of various adversarial attacks, examining their effectiveness, transferability, stealthiness and potential for generating misaligned or toxic outputs, as well as assessing the resilience of state-of-the-art T2I models to such attacks. We observe that state-of-the-art T2I systems are vulnerable to adversarial prompts, with the most effective attacks achieving success rates of over 60% across the majority of T2I models we tested. Second, RT2I-Benchenables the creation of a set of strong adversarial prompts (consisting of 1,439 that induce misaligned or targeted outputs and 173 that induce toxic outputs), which are effective across a wide range of systems. Finally, our benchmark is designed to be extensible, enabling the seamless addition of new attacks, T2I models, and evaluation metrics. This framework provides an automated solution for robustness assessment and adversarial prompt generation in T2I systems.

CAUTION: This paper contains AI-generated images that may be considered offensive or inappropriate. This repository contains code that can result in the generation of prompts and images that may be considered offensive or inappropriate

Usage

Check packages.txt for the required packages.
Specify the parameters of the individual components (datasets, attacks, T2I models, evaluation models) in the corresponding .yaml files in Results/configs
Specify the experiment parameters (selection of attacks, models, etc. to use) in Scripts/run_experiments.sh
Run the experiments with:

cd Scripts
sh run_experiments.sh

This generates a parquet file in Results/ containing a table with the detailed results. These results need to be processed next in order to generate the final statistics (in the terminal output) and the datasets of adversarial prompts.

cd Scripts
sh process_results.sh
sh compute_stats.sh

Note that for the above scripts to work, you need to set the correct paths and parameter values in their corresponding python programs, i.e., Results/process_results.py and Results/compute_stats.py. There are comments in these files indicating where to set these parameters.

Benchmark Components - References

Below are the components of RT2I-Bench. We provide links to the paper and code repositories of the datasets, attacks, models, and evaluation measures used in our benchmark.

Datasets

CIFAR100 [link]
COCO [link files]

Attacks

QF-Attack [paper, code]
MMP-Attack [paper, code]
SDTargeted [paper, code]
AsymmetricAttack [paper, code]
TuRBO [paper, code]
Ring-A-Bell [paper, code]
MMA-Diffusion [paper, code]
Typos (addition)
Typos (swap)
Typos (substitution)

Models

Stable Diffusion v1.3 [code]
Stable Diffusion v1.4 [code]
Stable Diffusion v1.5 [code]
Stable Diffusion v2.1 [code]
Stable Diffusion XL [code]
DALL·E Mini [code]
Hunyuan-DiT [code]
Safe Latent Diffusion [code]
SafeGen [code]

Evaluation Measures

CLIP Score [code]
BLIP [code]
BLIP2 [code]
LLAVA [code]
Qwen-VL-Chat [code]
InternVL-Chat [code]
MHSC Score

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Attacks		Attacks
Datasets		Datasets
Evaluation		Evaluation
Models		Models
Results		Results
Scripts		Scripts
Tools		Tools
Utils		Utils
LICENSE		LICENSE
README.md		README.md
benchmark.png		benchmark.png
packages.txt		packages.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RT2I-Bench: Evaluating Robustness of Text-to-Image Systems Against Adversarial Attacks

Abstract

Usage

Benchmark Components - References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

OptimAI-Lab/RT2I-Bench

Folders and files

Latest commit

History

Repository files navigation

RT2I-Bench: Evaluating Robustness of Text-to-Image Systems Against Adversarial Attacks

Abstract

Usage

Benchmark Components - References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages