Skip to content

[TMLR] RT2I-Bench: Evaluating Robustness of Text-to-Image Systems Against Adversarial Attacks

License

Notifications You must be signed in to change notification settings

OptimAI-Lab/RT2I-Bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RT2I-Bench: Evaluating Robustness of Text-to-Image Systems Against Adversarial Attacks

Paper Link

benchmark sketch


Official repository for the paper

RT2I-Bench: Evaluating Robustness of Text-to-Image Systems Against Adversarial Attacks

Athanasios Glentis∗, Ioannis Tsaknakis∗, Jiangweizhi Peng, Xun Xian, Yihua Zhang, Gaowen Liu, Charles Fleming, Mingyi Hong 

Transactions on Machine Learning Research (TMLR), 2026.

*Equal contribution.

Abstract

Text-to-Image (T2I) systems have demonstrated impressive abilities in the generation of images from text descriptions. However, these systems remain susceptible to adversarial prompts—carefully crafted input manipulations that can result in misaligned or even toxic outputs. This vulnerability highlights the need for systematic evaluation of attack strategies that exploit these weaknesses, as well as for testing the robustness of T2I systems against them. To this end, this work introduces the RT2I-Bench benchmark. RT2I-Bench serves two primary purposes. First, it provides a structured evaluation of various adversarial attacks, examining their effectiveness, transferability, stealthiness and potential for generating misaligned or toxic outputs, as well as assessing the resilience of state-of-the-art T2I models to such attacks. We observe that state-of-the-art T2I systems are vulnerable to adversarial prompts, with the most effective attacks achieving success rates of over 60% across the majority of T2I models we tested. Second, RT2I-Benchenables the creation of a set of strong adversarial prompts (consisting of 1,439 that induce misaligned or targeted outputs and 173 that induce toxic outputs), which are effective across a wide range of systems. Finally, our benchmark is designed to be extensible, enabling the seamless addition of new attacks, T2I models, and evaluation metrics. This framework provides an automated solution for robustness assessment and adversarial prompt generation in T2I systems.

CAUTION: This paper contains AI-generated images that may be considered offensive or inappropriate. This repository contains code that can result in the generation of prompts and images that may be considered offensive or inappropriate

Usage

  • Check packages.txt for the required packages.

  • Specify the parameters of the individual components (datasets, attacks, T2I models, evaluation models) in the corresponding .yaml files in Results/configs

  • Specify the experiment parameters (selection of attacks, models, etc. to use) in Scripts/run_experiments.sh

  • Run the experiments with:

cd Scripts
sh run_experiments.sh

This generates a parquet file in Results/ containing a table with the detailed results. These results need to be processed next in order to generate the final statistics (in the terminal output) and the datasets of adversarial prompts.

cd Scripts
sh process_results.sh
sh compute_stats.sh

Note that for the above scripts to work, you need to set the correct paths and parameter values in their corresponding python programs, i.e., Results/process_results.py and Results/compute_stats.py. There are comments in these files indicating where to set these parameters.

Benchmark Components - References

Below are the components of RT2I-Bench. We provide links to the paper and code repositories of the datasets, attacks, models, and evaluation measures used in our benchmark.

Datasets

Attacks

Models

  • Stable Diffusion v1.3 [code]
  • Stable Diffusion v1.4 [code]
  • Stable Diffusion v1.5 [code]
  • Stable Diffusion v2.1 [code]
  • Stable Diffusion XL [code]
  • DALL·E Mini [code]
  • Hunyuan-DiT [code]
  • Safe Latent Diffusion [code]
  • SafeGen [code]

Evaluation Measures

About

[TMLR] RT2I-Bench: Evaluating Robustness of Text-to-Image Systems Against Adversarial Attacks

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •