This project is a project for the course Nueral Networks, at the Univesity of Siena.
The project is about image captioning, which is a task of generating a description of an image.
The project is structured as follows:
data/: Dataset, preprocessing scripts, processed dataraw_data/: contains the raw dataprocessed/: contains the processed datadata_set/: contains the data set class for loading the datapre_processing.py: contains the code for preprocessing the data
inference/: contains the code for prediction and evaluationcaption_predictor.py: contains the code for predicting the caption , use the checkpoint models to predict the caption base on test setevaluation.py: contains the code for evaluating the model, use the prediction results to evalute the model base on BLEU score
models/: contains the code for the modelsbase_model.py: contains the code for the base model - just a LSTM model and resnet50attention_model.py: contains the code for the attention model - a LSTM model with attention mechanism
text/: contains the code for the text processing and tokenizertokenizer.py: contains the code for the tokenizer, and make the vocabulary, and the word to index mapping
utils/: contains just the config filescripts/:start.sh: contains the code for making environment and install the dependenciesprepare_data.sh: contains the code for downloading-preprocessing dataset
- In the main path, run
./scripts/start.shfor making the environment and install the dependencies- It needs python 3.11 -> pytorch==2.5.1 has not support in python 3.12
- In the main path, run
run.pyfor running the project