This software package implements the Crystal Graph Convolutional Neural Networks (CGCNN) that takes an arbitary crystal structure to predict material properties.
The package provides two major functions:
Train a CGCNN model with a customized dataset.
Predict material properties of new crystals with a pre-trained CGCNN model.
The following paper describes the details of the CGCNN framework:
If you are new to Python, the easiest way of installing the prerequisites is via conda. After installing conda, run the following command to create a new environment named cgcnn and install all prerequisites:
*Note: this code is tested for PyTorch v1.0.0+ and is not compatible with versions below v0.4.0 due to some breaking changes.
This creates a conda environment for running CGCNN. Before using CGCNN, activate the environment by:
source activate cgcnn
Then, in directory cgcnn, you can test if all the prerequisites are installed properly by running:
python main.py -h
python predict.py -h
This should display the help messages for main.py and predict.py. If you find no error messages, it means that the prerequisites are installed properly.
After you finished using CGCNN, exit the environment by:
source deactivate
Usage
Define a customized dataset
To input crystal structures to CGCNN, you will need to define a customized dataset. Note that this is required for both training and predicting.
Before defining a customized dataset, you will need:
CIF files recording the structure of the crystals that you are interested in
The target properties for each crystal (not needed for predicting, but you need to put some random numbers in id_prop.csv)
You can create a customized dataset by creating a directory root_dir with the following files:
id_prop.csv: a CSV file with two columns. The first column recodes a unique ID for each crystal, and the second column recodes the value of target property. If you want to predict material properties with predict.py, you can put any number in the second column. (The second column is still needed.)
atom_init.json: a JSON file that stores the initialization vector for each element. An example of atom_init.json is data/sample-regression/atom_init.json, which should be good for most applications.
ID.cif: a CIF file that recodes the crystal structure, where ID is the unique ID for the crystal.
There are two examples of customized datasets in the repository: data/sample-regression for regression and data/sample-classification for classification.
For advanced PyTorch users
The above method of creating a customized dataset uses the CIFData class in cgcnn.data. If you want a more flexible way to input crystal structures, PyTorch has a great Tutorial for writing your own dataset class.
Train a CGCNN model
Before training a new CGCNN model, you will need to:
Then, in directory cgcnn, you can train a CGCNN model for your customized dataset by:
python main.py root_dir
You can set the number of training, validation, and test data with labels --train-size, --val-size, and --test-size. Alternatively, you may use the flags --train-ratio, --val-ratio, --test-ratio instead. Note that the ratio flags cannot be used with the size flags simultaneously. For instance, data/sample-regression has 10 data points in total. You can train a model by:
Note that for classification, the predicted values in test_results.csv is a probability between 0 and 1 that the crystal can be classified as 1 (metal in the above example).
After predicting, you will get one file in cgcnn directory:
test_results.csv: stores the ID, target value, and predicted value for each crystal in test set. Here the target value is just any number that you set while defining the dataset in id_prop.csv, which is not important.
Data
To reproduce our paper, you can download the corresponding datasets following the instruction.
Crystal Graph Convolutional Neural Networks
This software package implements the Crystal Graph Convolutional Neural Networks (CGCNN) that takes an arbitary crystal structure to predict material properties.
The package provides two major functions:
The following paper describes the details of the CGCNN framework:
Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties
Table of Contents
How to cite
Please cite the following work if you want to use CGCNN.
Prerequisites
This package requires:
If you are new to Python, the easiest way of installing the prerequisites is via conda. After installing conda, run the following command to create a new environment named
cgcnnand install all prerequisites:*Note: this code is tested for PyTorch v1.0.0+ and is not compatible with versions below v0.4.0 due to some breaking changes.
This creates a conda environment for running CGCNN. Before using CGCNN, activate the environment by:
Then, in directory
cgcnn, you can test if all the prerequisites are installed properly by running:This should display the help messages for
main.pyandpredict.py. If you find no error messages, it means that the prerequisites are installed properly.After you finished using CGCNN, exit the environment by:
Usage
Define a customized dataset
To input crystal structures to CGCNN, you will need to define a customized dataset. Note that this is required for both training and predicting.
Before defining a customized dataset, you will need:
id_prop.csv)You can create a customized dataset by creating a directory
root_dirwith the following files:id_prop.csv: a CSV file with two columns. The first column recodes a uniqueIDfor each crystal, and the second column recodes the value of target property. If you want to predict material properties withpredict.py, you can put any number in the second column. (The second column is still needed.)atom_init.json: a JSON file that stores the initialization vector for each element. An example ofatom_init.jsonisdata/sample-regression/atom_init.json, which should be good for most applications.ID.cif: a CIF file that recodes the crystal structure, whereIDis the uniqueIDfor the crystal.The structure of the
root_dirshould be:There are two examples of customized datasets in the repository:
data/sample-regressionfor regression anddata/sample-classificationfor classification.For advanced PyTorch users
The above method of creating a customized dataset uses the
CIFDataclass incgcnn.data. If you want a more flexible way to input crystal structures, PyTorch has a great Tutorial for writing your own dataset class.Train a CGCNN model
Before training a new CGCNN model, you will need to:
root_dirto store the structure-property relations of interest.Then, in directory
cgcnn, you can train a CGCNN model for your customized dataset by:You can set the number of training, validation, and test data with labels
--train-size,--val-size, and--test-size. Alternatively, you may use the flags--train-ratio,--val-ratio,--test-ratioinstead. Note that the ratio flags cannot be used with the size flags simultaneously. For instance,data/sample-regressionhas 10 data points in total. You can train a model by:or alternatively
You can also train a classification model with label
--task classification. For instance, you can usedata/sample-classificationby:After training, you will get three files in
cgcnndirectory.model_best.pth.tar: stores the CGCNN model with the best validation accuracy.checkpoint.pth.tar: stores the CGCNN model at the last epoch.test_results.csv: stores theID, target value, and predicted value for each crystal in test set.Predict material properties with a pre-trained CGCNN model
Before predicting the material properties, you will need to:
root_dirfor all the crystal structures that you want to predict.pre-trained.pth.tar.Then, in directory
cgcnn, you can predict the properties of the crystals inroot_dir:For instace, you can predict the formation energies of the crystals in
data/sample-regression:And you can also predict if the crystals in
data/sample-classificationare metal (1) or semiconductors (0):Note that for classification, the predicted values in
test_results.csvis a probability between 0 and 1 that the crystal can be classified as 1 (metal in the above example).After predicting, you will get one file in
cgcnndirectory:test_results.csv: stores theID, target value, and predicted value for each crystal in test set. Here the target value is just any number that you set while defining the dataset inid_prop.csv, which is not important.Data
To reproduce our paper, you can download the corresponding datasets following the instruction.
Authors
This software was primarily written by Tian Xie who was advised by Prof. Jeffrey Grossman.
License
CGCNN is released under the MIT License.