We decomposed molecules in CrossDocked2020 trainig set into arms and stored processed data in arm_info_2.pt, which can be downloaded here. Then we docked arms with target protein with Vina Minimize and obtained docked arm conformations as conditions for training.
We follow the preprocess of DecompDiff. We have provided processed dataset here.
Training
To train the model from scratch, you need to download the *.lmdb, *_name2id.pt and split_by_name.pt files and put them in the ./data directory. Then, you can run the following command:
To sample molecules given protein pockets in the test set, you need to download test_index.pkl and *_eval.tar.gz files, unzip it and put them in the ./data directory. To sample molecules with beta priors, you also need to download beta_priors.zip and natom_models.pkl and put them in the ./pregen_info directory. Then, you can run the following command:
This script samples for opt prior by default. We have provided the trained model checkpoint here. You need to download both decompdiff.pt and decompopt.pt.
After sampling, Vina Dock is evaluated and the best results are selected:
@inproceedings{
zhou2024decompopt,
title={DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization},
author={Xiangxin Zhou and Xiwei Cheng and Yuwei Yang and Yu Bao and Liang Wang and Quanquan Gu},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=Y3BbxvAQS9}
}
DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization
This repository is the official implementation of DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization.
Dependencies
Install via Conda and Pip
Preprocess
We decomposed molecules in CrossDocked2020 trainig set into arms and stored processed data in
arm_info_2.pt, which can be downloaded here. Then we docked arms with target protein with Vina Minimize and obtained docked arm conformations as conditions for training.We follow the preprocess of DecompDiff. We have provided processed dataset here.
Training
To train the model from scratch, you need to download the
*.lmdb,*_name2id.ptandsplit_by_name.ptfiles and put them in the./datadirectory. Then, you can run the following command:Sampling and Evaluation
To sample molecules given protein pockets in the test set, you need to download
test_index.pkland*_eval.tar.gzfiles, unzip it and put them in the./datadirectory. To sample molecules with beta priors, you also need to downloadbeta_priors.zipandnatom_models.pkland put them in the./pregen_infodirectory. Then, you can run the following command:This script samples for opt prior by default. We have provided the trained model checkpoint here. You need to download both
decompdiff.ptanddecompopt.pt. After sampling, Vina Dock is evaluated and the best results are selected:BibTex