Skip to content

EMNLP 2018. Learning to Describe Differences Between Pairs of Similar Images. Harsh Jhamtani, Taylor Berg-Kirkpatrick.

License

Notifications You must be signed in to change notification settings

harsh19/spot-the-diff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spot-the-diff

Harsh Jhamtani, Taylor Berg-Kirkpatrick. Learning to Describe Differences Between Pairs of Similar Images. EMNLP 2018
Link: https://arxiv.org/pdf/1808.10584.pdf

Dataset

  • v0.1 of dataset is present in data/.

Annotations:

  • data/annotations/ contains threee json files representing train,val,test splits
  • format of each json file is as follows: each file represents a list. each item in the list is a dictionary consisting of 'img_id' and 'sentences' keys. e.g.
    {"img_id": "400", "sentences": ["two of the three people in the front of the picture have moved", "there is a vehicle in the far back that is only in image two"]

Images

  • data/resized_images/ contains the relevant images.
  • naming convention: <img_id>.png, <img_id>_2.png
  • we have also provided the corresponding diff images: <img_id>_diff.jpg
  • All images have been resized to 224,224
  • Original size images: bit.ly/spot_diff_data

Cluster data

  • We provide clusters of differing pixels computed under suggested paramter settings and clustering algorithm.
  • For more details, check Code/usage.ipynb

Others

  • Clustering code has been added

TODO

  • Model Predictions (multi)

Reference

If you use the data or code, please consider citing

@inproceedings{jhamtani2018learning,
  title={Learning to Describe Differences Between Pairs of Similar Images},
  author={Jhamtani, Harsh and Berg-Kirkpatrick, Taylor},
  booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2018}
}

License

  • Code: MIT License (see LICENSE).
  • Data:
    • Images derived from the VIRAT Video Dataset are governed by the original VIRAT Video Dataset Usage Agreement (see DATA_LICENSE.txt).
    • Text annotations created by the authors are released under the MIT License unless otherwise noted.

About

EMNLP 2018. Learning to Describe Differences Between Pairs of Similar Images. Harsh Jhamtani, Taylor Berg-Kirkpatrick.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published