BatteryInformatics

Source code and trained models for the paper "Comparative Analysis of Structure-Property Machine Learning models for Predicting Electrolyte Thermodynamic Windows".

General overview of the modeling framework

Set-up environment

Clone this repository and then use setup.sh to setup a virtual environment binfo with the required dependencies in requirements.txt.

chmod +x setup.sh
git clone https://github.com/EnthusiasticTeslim/BatteryInformatics.git
cd BatteryInformatics
sh setup.sh
source binfo/bin/activate

Training Models

Important

All scripts for training models are available in Docker mode in folder docker.

Traditional Model

python src/descriptor/trainer.py -h
usage: trainer.py [-h] [--parent_directory PARENT_DIRECTORY] [--data_directory DATA_DIRECTORY] [--result_directory RESULT_DIRECTORY] [--src SRC] [--train_data TRAIN_DATA] [--test_data TEST_DATA] [--scale] [--hyperparameter HYPERPARAMETER] [--iterations ITERATIONS] [--cv CV] [--model MODEL] [--seed SEED]

options:
  --parent_directory PARENT_DIRECTORY
                        Path to main directory
  --data_directory DATA_DIRECTORY
                        where the data is stored in parent directory
  --result_directory RESULT_DIRECTORY
                        Path to result directory
  --src SRC             function source directory
  --train_data TRAIN_DATA
                        Path to train data
  --test_data TEST_DATA
                        Path to test data
  --scale               Scale data
  --hyperparameter HYPERPARAMETER
                        Hyperparameter space
  --iterations ITERATIONS
                        Number of iterations for hyperparameter optimization
  --cv CV               Number of cross-validation folds
  --model MODEL         Model to train
  --seed SEED           Random seed
  --morgan_fingerprint  Use Morgan Fingerprint (MFF) instead of RDKit descriptors
  --nbits NBITS         Number of bits for MFF
  --radius RADIUS       Radius for MFF

The model and its predictions will be saved in results/<MODEL>. For example, to train a SVR model using RDKIT descriptor, you can use:

python -m src/descriptor/trainer.py --parent_directory YOUR_MAIN_FOLDER --result_directory results --data_directory data --train_data "train_data_cleaned.csv" --test_data "test_data_cleaned.csv" --scale --model SVR --seed 42 --iterations 100 --hyperparameter "hp_descriptor.yaml" --cv 5

To train the whole model (SVR, RandomForest, AdaBoostRegressor, GradientBoostingRegressor),

chmod a+x regenerate/descriptor.sh
./descriptor.sh

Graph Neural Network

python src/graph/trainer.py -h
trainer.py [-h] [--parent_directory PARENT_DIRECTORY] [--result_directory RESULT_DIRECTORY] [--data_directory DATA_DIRECTORY] [--train_data TRAIN_DATA] [--test_data TEST_DATA] [--add_features]
                  [--skip_cv] [--epochs EPOCHS] [--start-epoch START_EPOCH] [--batch_size BATCH_SIZE] [--lr LR] [--gpu GPU] [--cv CV] [--dim_input DIM_INPUT] [--unit_per_layer UNIT_PER_LAYER] [--seed SEED]
                  [--num_feat NUM_FEAT] [--train]

options:
  --parent_directory PARENT_DIRECTORY
                        Path to main directory
  --result_directory RESULT_DIRECTORY
                        Path to result directory
  --data_directory DATA_DIRECTORY
                        where the data is stored in parent directory
  --train_data TRAIN_DATA
                        name of train data
  --test_data TEST_DATA
                        name of test data
  --add_features        if add features
  --skip_cv             if skip cross validation
  --epochs EPOCHS       number of total epochs to run
  --start-epoch START_EPOCH
                        manual epoch number (useful on restarts)
  --batch_size BATCH_SIZE
                        mini-batch size (default: 256)
  --lr LR               initial learning rate
  --gpu GPU             GPU ID to use.
  --cv CV               k-fold cross validation
  --dim_input DIM_INPUT
                        dimension of input
  --unit_per_layer UNIT_PER_LAYER
                        unit per layer
  --seed SEED           seed number
  --num_feat NUM_FEAT   number of additional features
  --train               if train
  --print_result        if print result

To train the GNN model,

chmod a+x regenerate/graph.sh
./graph.sh

and its checkpoints and predictions will be saved in results/GNN.

For example, to train a GNN model you can use:

python -m src/graph/trainer.py --parent_directory YOUR_MAIN_FOLDER --result_directory results --data_directory data --train_data "train_data_cleaned.csv" --test_data "test_data_cleaned.csv" --seed 42 --iterations 100 --train --cv 5

to test and an already train model, you can use:

python -m src/graph/trainer.py --parent_directory YOUR_MAIN_FOLDER --result_directory results --data_directory data --train_data "train_data_cleaned.csv" --test_data "test_data_cleaned.csv" --seed 42 --iterations 100 --cv 5

Transformer

under construction

Status

Complete data cleaning
Scripts for QSPR with RDKIT descriptors.
Scripts for QSPR with Graph.
Train ML with RDKIT and Graph.
[] Set up ML model with transformer.
[] Evaluate performances.
[] Deploy models as a GUI.

How to cite

@article{doi,
  author = {Teslim Olayiwola, Jose Romagnoli},
  title = {Comparative Analysis of Structure-Property Machine Learning models for Predicting Electrolyte Thermodynamic Windows},
  journal = {n/a},
  year = {n/a},
  volume = {n/a},
  number = {n/a},
  doi = {https://doi.org/},
  preprint = {Manuscript in Preparation}
}

License

BatteryInformatics is under MIT license. For use of specific models, please refer to the model licenses found in the original packages.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
data		data
docker		docker
notebooks		notebooks
regenerate		regenerate
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BatteryInformatics

Table of Contents

Set-up environment

Training Models

Traditional Model

Graph Neural Network

Transformer

Status

How to cite

License

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

EnthusiasticTeslim/BatteryInformatics

Folders and files

Latest commit

History

Repository files navigation

BatteryInformatics

Table of Contents

Set-up environment

Training Models

Traditional Model

Graph Neural Network

Transformer

Status

How to cite

License

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages