Skip to content

[WACV 2025] Official code repository for "Seeing Eye to AI: Comparing Human Gaze and Model Attention in Video Memorability". https://arxiv.org/abs/2311.16484

License

Notifications You must be signed in to change notification settings

esh04/SeeingEyeToAI

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

👁️ Seeing Eye to AI: Comparing Human Gaze and Model Attention in Video Memorability

📑 Contents

  1. About
  2. Setup
  3. Dataset
  4. Repository Structure
  5. Usage
  6. Citation

🤖 About

Understanding what makes a video memorable has important applications in advertising or education technology. Towards this goal, we investigate spatio-temporal attention mechanisms underlying video memorability. Different from previous works that fuse multiple features, we adopt a simple CNN+Transformer architecture that enables analysis of spatio-temporal attention while matching state-of-the-art (SoTA) performance on video memorability prediction. We compare model attention against human gaze fixations collected through a small-scale eye-tracking study where humans perform the video memory task. We uncover the following insights: (i) Quantitative saliency metrics show that our model, trained only to predict a memorability score, exhibits similar spatial attention patterns to human gaze, especially for more memorable videos. (ii) The model assigns greater importance to initial frames in a video, mimicking human attention patterns. (iii) Panoptic segmentation reveals that both (model and humans) assign a greater share of attention to things and less attention to stuff as compared to their occurrence probability.

For more details, please visit our project website or read our paper.

🛠️ Setup

  1. Clone the repository
git clone [repository-url]
cd [repository-name]
  1. Install dependencies
pip install -r requirements.txt

📊 Dataset

The Memento dataset can be downloaded from http://memento.csail.mit.edu/#Dataset.

📁 Repository Structure

.
├── main.py                # Main training script
├── embed.py               # Video embedding generation
├── attention.py           # Attention matrix extraction
├── panoptic.py            # Panoptic segmentation
├── requirements.txt       # Python dependencies
├── eyetracking/           # Eye-tracking data and related processing
├── utils/
│   ├── model.py           # Transformer model implementation
│   └── dataset.py         # Dataset handling

🚀 Usage

1. Generate Embeddings

python embed.py --path /path/to/videos

2. Train Model

python main.py \
    --path /path/to/embeddings \
    --train_data_path /path/to/train.csv \
    --val_data_path /path/to/val.csv

3. Attention Analysis

Extract attention matrices to analyze model's focus:

python attention.py \
    --model_path /path/to/trained/model.pt \
    --val_path /path/to/val.csv \
    --features_path /path/to/features

4. Panoptic Segmentation

Generate panoptic segmentation results:

python panoptic.py \
    --video_path /path/to/videos

📝 Citation

If you use this code in your research, please cite our paper:

@article{kumar2025eyetoai,
    title = {{Seeing Eye to AI: Comparing Human Gaze and Model Attention in Video Memorability}},
    author = {Kumar, Prajneya and Khandelwal, Eshika and Tapaswi, Makarand and Sreekumar, Vishnu},
    year = {2025},
    booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}
}

About

[WACV 2025] Official code repository for "Seeing Eye to AI: Comparing Human Gaze and Model Attention in Video Memorability". https://arxiv.org/abs/2311.16484

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 94.4%
  • Python 5.6%