Skip to content

Applied different Cheminformatics methods to map solvent structure to redox potential

License

Notifications You must be signed in to change notification settings

EnthusiasticTeslim/BatteryInformatics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

python version license author

BatteryInformatics

Source code and trained models for the paper "Comparative Analysis of Structure-Property Machine Learning models for Predicting Electrolyte Thermodynamic Windows".

General overview of the modeling framework General overview of the modeling framework

Table of Contents

Table of Contents
  1. ➤ Set-up environment
  2. ➤ Training Machine Learning Models
  3. ➤ Status
  4. ➤ How to cite
  5. ➤ License
  6. ➤ References

Set-up environment

Clone this repository and then use setup.sh to setup a virtual environment binfo with the required dependencies in requirements.txt.

chmod +x setup.sh
git clone https://github.com/EnthusiasticTeslim/BatteryInformatics.git
cd BatteryInformatics
sh setup.sh
source binfo/bin/activate

Training Models

Important

All scripts for training models are available in Docker mode in folder docker.

Traditional Model

python src/descriptor/trainer.py -h
usage: trainer.py [-h] [--parent_directory PARENT_DIRECTORY] [--data_directory DATA_DIRECTORY] [--result_directory RESULT_DIRECTORY] [--src SRC] [--train_data TRAIN_DATA] [--test_data TEST_DATA] [--scale] [--hyperparameter HYPERPARAMETER] [--iterations ITERATIONS] [--cv CV] [--model MODEL] [--seed SEED]

options:
  --parent_directory PARENT_DIRECTORY
                        Path to main directory
  --data_directory DATA_DIRECTORY
                        where the data is stored in parent directory
  --result_directory RESULT_DIRECTORY
                        Path to result directory
  --src SRC             function source directory
  --train_data TRAIN_DATA
                        Path to train data
  --test_data TEST_DATA
                        Path to test data
  --scale               Scale data
  --hyperparameter HYPERPARAMETER
                        Hyperparameter space
  --iterations ITERATIONS
                        Number of iterations for hyperparameter optimization
  --cv CV               Number of cross-validation folds
  --model MODEL         Model to train
  --seed SEED           Random seed
  --morgan_fingerprint  Use Morgan Fingerprint (MFF) instead of RDKit descriptors
  --nbits NBITS         Number of bits for MFF
  --radius RADIUS       Radius for MFF

The model and its predictions will be saved in results/<MODEL>. For example, to train a SVR model using RDKIT descriptor, you can use:

python -m src/descriptor/trainer.py --parent_directory YOUR_MAIN_FOLDER --result_directory results --data_directory data --train_data "train_data_cleaned.csv" --test_data "test_data_cleaned.csv" --scale --model SVR --seed 42 --iterations 100 --hyperparameter "hp_descriptor.yaml" --cv 5

To train the whole model (SVR, RandomForest, AdaBoostRegressor, GradientBoostingRegressor),

chmod a+x regenerate/descriptor.sh
./descriptor.sh

Graph Neural Network

python src/graph/trainer.py -h
trainer.py [-h] [--parent_directory PARENT_DIRECTORY] [--result_directory RESULT_DIRECTORY] [--data_directory DATA_DIRECTORY] [--train_data TRAIN_DATA] [--test_data TEST_DATA] [--add_features]
                  [--skip_cv] [--epochs EPOCHS] [--start-epoch START_EPOCH] [--batch_size BATCH_SIZE] [--lr LR] [--gpu GPU] [--cv CV] [--dim_input DIM_INPUT] [--unit_per_layer UNIT_PER_LAYER] [--seed SEED]
                  [--num_feat NUM_FEAT] [--train]

options:
  --parent_directory PARENT_DIRECTORY
                        Path to main directory
  --result_directory RESULT_DIRECTORY
                        Path to result directory
  --data_directory DATA_DIRECTORY
                        where the data is stored in parent directory
  --train_data TRAIN_DATA
                        name of train data
  --test_data TEST_DATA
                        name of test data
  --add_features        if add features
  --skip_cv             if skip cross validation
  --epochs EPOCHS       number of total epochs to run
  --start-epoch START_EPOCH
                        manual epoch number (useful on restarts)
  --batch_size BATCH_SIZE
                        mini-batch size (default: 256)
  --lr LR               initial learning rate
  --gpu GPU             GPU ID to use.
  --cv CV               k-fold cross validation
  --dim_input DIM_INPUT
                        dimension of input
  --unit_per_layer UNIT_PER_LAYER
                        unit per layer
  --seed SEED           seed number
  --num_feat NUM_FEAT   number of additional features
  --train               if train
  --print_result        if print result

To train the GNN model,

chmod a+x regenerate/graph.sh
./graph.sh

and its checkpoints and predictions will be saved in results/GNN.

For example, to train a GNN model you can use:

python -m src/graph/trainer.py --parent_directory YOUR_MAIN_FOLDER --result_directory results --data_directory data --train_data "train_data_cleaned.csv" --test_data "test_data_cleaned.csv" --seed 42 --iterations 100 --train --cv 5

to test and an already train model, you can use:

python -m src/graph/trainer.py --parent_directory YOUR_MAIN_FOLDER --result_directory results --data_directory data --train_data "train_data_cleaned.csv" --test_data "test_data_cleaned.csv" --seed 42 --iterations 100 --cv 5

Transformer

under construction

Status

  • Complete data cleaning
  • Scripts for QSPR with RDKIT descriptors.
  • Scripts for QSPR with Graph.
  • Train ML with RDKIT and Graph.
  • [] Set up ML model with transformer.
  • [] Evaluate performances.
  • [] Deploy models as a GUI.

How to cite

@article{doi,
  author = {Teslim Olayiwola, Jose Romagnoli},
  title = {Comparative Analysis of Structure-Property Machine Learning models for Predicting Electrolyte Thermodynamic Windows},
  journal = {n/a},
  year = {n/a},
  volume = {n/a},
  number = {n/a},
  doi = {https://doi.org/},
  preprint = {Manuscript in Preparation}
}

License

BatteryInformatics is under MIT license. For use of specific models, please refer to the model licenses found in the original packages.

References

About

Applied different Cheminformatics methods to map solvent structure to redox potential

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published