Download the dataset and oracles from Dropbox and unzip them to the main directory
wget https://www.dropbox.com/scl/fi/v6rsdcnah88vfp4wkmn9z/data.zip?rlkey=atphgvxj5acwzmya9vp0uemdw&dl=0 -O data.zip
wget https://www.dropbox.com/scl/fi/4btqejohjbj6h75hlvn1w/oracles.zip?rlkey=7k4p3rf8ynv2a8xwa4q70irrj&dl=0 -O oracles.zip
unzip data.zip
unzip oracles.zip
python scripts/train_predictor.py configs/gfp_ddg/train_predictor_GFP4ddg.yml --logdir logs_predictors
On each dataset, one predictor is required for each property of interest.
python scripts/run_MosPro.py configs/gfp_ddg/MosPro_GFP_stability.yml --logdir logs_MosPro
The config files can be found in the configs directory and they are organized in subdirectories according to different property objectives. Make sure you change the predictor path in the sampling configuration file to your own.
Each experiment will create a corresponding log directory under the specified directory. All the checkpoints and sample results will be stored in the log directory.
To evaluate the sample results of MosPro on a certain benchmarking dataset, use the corresponding evaluation script. For example, to evaluate the samples of the GFP-stability dataset, run the following:
python scripts/evaluate_GFP_stability.py configs/gfp_ddg/evaluate.yml --sample_path path/to/sample_csv
An evaluation result file and metric file for 500 randomly selected samples will be generated in the same directory of path/to/sample_csv.
Note that for evaluating protein stability you need to download FoldX and copy the FoldX executable file to the main directory of this repo.