Scalable Crystal Structure Relaxation Using an Iteration-Free Deep Generative Model with Uncertainty Quantification (DeepRelax)

Dataset

Our research utilizes datasets that are publicly accessible. Details and access links for each dataset are provided below:

XMnO Dataset [1]: Available at XMnO.
MP Dataset [2]: Available at MPF.2021.2.8.
C2DB Dataset [3, 4, 5]: Available at C2DB.

For convenience, both raw and processed data from these datasets can also be downloaded from Zenodo.

Requirements

Required Python packages include:

ase==3.22.1
config==0.5.1
lmdb==1.4.1
matplotlib==3.7.2
numpy==1.24.4
pandas==2.1.3
pymatgen==2023.5.10
scikit_learn==1.3.0
scipy==1.11.4
torch==1.13.1
torch_geometric==2.2.0
torch_scatter==2.1.0
tqdm==4.66.1

Alternatively, install the environment using the provided YAML file at ./environment/environment.yaml.

Logger

For logging, we recommend using Wandb. More details are available at https://wandb.ai/. Training logs and trained models are stored in the ./wandb directory. The saved model can typically be found at ./wandb/run-xxx/files/model.pt, where xxx represents specific run information.

Step-by-Step Guide

Data Preprocessing

To begin working with the datasets, first download the necessary files from Zenodo and unzip them. You will find the preprocessed data in the following directories for each dataset:

For the XMnO dataset: cifs_xmno/train_DeepRelax, cifs_xmno/val_DeepRelax, cifs_xmno/test_DeepRelax
For the MP dataset: Similar directory structure as XMnO
For the C2DB dataset: Please note that the C2DB dataset is available upon request. Contact the corresponding author of the C2DB dataset to obtain the files, including relaxed.db and unrelaxed.db. After successfully requesting the C2DB dataset, process it using convert_c2db.py available in this repository.

Preprocessing Data from Scratch

If you prefer to preprocess the data from scratch, use the following commands, ensuring you replace your_data_path with the appropriate path to your data:

For the XMnO dataset:

python preprocess_xmno.py --data_root your_data_path/cifs_xmno --num_workers 1

For the MP dataset:

python preprocess_mp.py --data_root your_data_path/MPF.2021.2.8 --num_workers 1

For the C2DB dataset:

python preprocess_c2db.py --data_root your_data_path/c2db --num_workers 1

To increase the processing speed, you can adjust the --num_workers parameter to a higher value, depending on your system's capabilities.

Train the Model

To initiate training of the DeepRelax model, execute the following commands. Make sure to substitute your_data_path with the actual path to your dataset:

For the XMnO dataset:

python train.py --data_root your_data_path/cifs_xmno --num_workers 4 --batch_size 32 --steps_per_epoch 800

For the MP dataset:

python train.py --data_root your_data_path/MPF.2021.2.8 --num_workers 4 --batch_size 32 --steps_per_epoch 800

For the C2DB dataset:

python train.py --data_root your_data_path/c2db --num_workers 4 --batch_size 32 --steps_per_epoch 100

Test the Model

To evaluate the DeepRelax model, specifically on the XMnO dataset, run the following command, replacing your_data_path and your_model_path with the appropriate paths:

python edg_solver.py --data_root your_data_path/cifs_xmno --model_path your_model_path/model.pt

This process can be similarly applied to the other datasets. If you are using WandB for tracking experiments, the saved model can typically be found at ./wandb/run-xxx/files/model.pt, where xxx represents specific run information. The final results for each sample are stored in the ./results directory. To merge the results across all samples and obtain the overall performance metrics reported in the paper, run the check_performance.py script.

Practical Application of DeepRelax through Transfer Learning

DeepRelax is optimally utilized via transfer learning. This approach allows you to leverage a pre-trained model and adapt it to your specific use case. Below, we outline a demonstration to guide you in transferring the trained model to your application.

Organizing Your Data

First, ensure your data is structured as follows to facilitate processing:

custom/
- train.csv
- val.csv
- test.csv
- CIF/
  - data_1_unrelaxed.cif
  - data_1_relaxed.cif
  - data_2_unrelaxed.cif
  - data_2_relaxed.cif
  - ...

Note: The test set does not require relaxed structures. However, for training and validation sets, pairs of unrelaxed and relaxed structures are necessary.

Each .csv file should contain a column named atoms_id, with each row corresponding to the ID of a .cif file in your dataset. For example:

atoms_id
data_1
data_2
data_3
...

When defining atoms_id, ensure it is consistent with the names of your .cif files to maintain integrity and facilitate seamless processing.

Preprocessing Your Data

To convert your .cif files into a format suitable for DeepRelax, use the following command, replacing your_data_path with the path to your custom directory:

python preprocess_custom.py --data_root your_data_path/custom --num_workers 1

This command will process your .cif files and organize the output into two subdirectories (train_DeepRelax and val_DeepRelax) within the custom directory.

Applying Transfer Learning

After preprocessing your data, apply transfer learning to your custom dataset with the following command:

python train.py --data_root your_data_path/custom --num_workers 4 --batch_size 32 --steps_per_epoch 100 --transfer True

Ensure to replace your_data_path with the appropriate path to where your custom directory is located.

Testing the Model

Refer to the Test the Model section previously discussed to evaluate the performance of your model trained with transfer learning on your custom dataset. Remember, you need pairs of unrelaxed and relaxed structures for evaluation. If your test set lacks relaxed structures, you can't directly evaluate performance but can predict relaxed structures as follows.

Predicting the Relaxed Structures

To predict relaxed structures and save them as .cif files:

python predict_relaxed_structure.py --data_root your_data_path/custom --model_path your_model_path/model.pt

This script predicts the relaxed structure using record atoms_id in test.csv. Note that it is not need to provide relaxed structure for the test data. After running the prediction script, the predicted relaxed structures will be located in the ./predicted_structures directory within your project's root directory. This makes it easy to access and review the results of your model's predictions.

Citation

If you find the DeepRelax model beneficial for your research, please include a citation to our paper. You can reference it as follows:
@article{Yang2024,
author = {Ziduo Yang and Yi-Ming Zhao and Xian Wang and Xiaoqing Liu and Xiuying Zhang and Yifan Li and Qiujie Lv and Calvin Yu-Chian Chen and Lei Shen},
title = {Scalable crystal structure relaxation using an iteration-free deep generative model with uncertainty quantification},
journal = {Nature Communications},
volume = {15},
number = {1},
pages = {8148},
year = {2024},
month = {September},
doi = {10.1038/s41467-024-52378-3},
url = {https://doi.org/10.1038/s41467-024-52378-3},
issn = {2041-1723}
}

Acknowledgements

Some part of code in this project were adapted from OCP. We gratefully acknowledge the contributions from this source. We also acknowledge Prof. Kristian Sommer Thygesen and Peder Lyngby for their generous provision of the C2DB database, complete with both initial and final structures.

Reference

[1] Kim S, Noh J, Jin T, et al. A structure translation model for crystal compounds[J]. npj Computational Materials, 2023, 9(1): 142.
[2] Chen C, Ong S P. A universal graph deep learning interatomic potential for the periodic table[J]. Nature Computational Science, 2022, 2(11): 718-728.
[3] Haastrup S, Strange M, Pandey M, et al. The Computational 2D Materials Database: high-throughput modeling and discovery of atomically thin crystals[J]. 2D Materials, 2018, 5(4): 042002.
[4] Gjerding M N, Taghizadeh A, Rasmussen A, et al. Recent progress of the computational 2D materials database (C2DB)[J]. 2D Materials, 2021, 8(4): 044002.
[5] Lyngby P, Thygesen K S. Data-driven discovery of 2D materials by deep generative models[J]. npj Computational Materials, 2022, 8(1): 232.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
environment		environment
trained_model		trained_model
DeepRelax.py		DeepRelax.py
LICENSE		LICENSE
README.md		README.md
check_performance.py		check_performance.py
convert_c2db.py		convert_c2db.py
edg_solver.py		edg_solver.py
ema.py		ema.py
graph_constructor.py		graph_constructor.py
graph_utils.py		graph_utils.py
lmdb_dataset.py		lmdb_dataset.py
loss_function.py		loss_function.py
predict_relaxed_structure.py		predict_relaxed_structure.py
preprocess_c2db.py		preprocess_c2db.py
preprocess_custom.py		preprocess_custom.py
preprocess_mp.py		preprocess_mp.py
preprocess_xmno.py		preprocess_xmno.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scalable Crystal Structure Relaxation Using an Iteration-Free Deep Generative Model with Uncertainty Quantification (DeepRelax)

Dataset

Requirements

Logger

Step-by-Step Guide

Data Preprocessing

Preprocessing Data from Scratch

Train the Model

Test the Model

Practical Application of DeepRelax through Transfer Learning

Organizing Your Data

Preprocessing Your Data

Applying Transfer Learning

Testing the Model

Predicting the Relaxed Structures

Citation

Acknowledgements

Reference

About

Uh oh!

Releases

Packages

Languages

License

Shen-Group/DeepRelax

Folders and files

Latest commit

History

Repository files navigation

Scalable Crystal Structure Relaxation Using an Iteration-Free Deep Generative Model with Uncertainty Quantification (DeepRelax)

Dataset

Requirements

Logger

Step-by-Step Guide

Data Preprocessing

Preprocessing Data from Scratch

Train the Model

Test the Model

Practical Application of DeepRelax through Transfer Learning

Organizing Your Data

Preprocessing Your Data

Applying Transfer Learning

Testing the Model

Predicting the Relaxed Structures

Citation

Acknowledgements

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages