Virtual Characterization via Knowledge-Enhanced Representation Learning: from Organic Conjugated Molecules to Devices (Accepted in npj Computational Materials.)
[Paper] Guojiang Zhao ,Qi Ou ,Zifeng Zhao ,Shangqian Chen ,Haitao Lin ,Xiaohong Ji ,Zhen Wang ,Hongshuai Wang ,Hengxing Cai ,Lirong Wu ,Shuqi Lu ,FengTianCi Yang ,Zhifeng Gao ,Zheng Cheng. 10.26434/chemrxiv-2025-b6n4m
The characterization of material properties plays a crucial role in revealing the structure-property relationship and optimizing device performance. Organic optoelectronic and transporting materials, widely used in various fields, face challenges in experimental property characterization not only due to their high cost but also the requirement of multidisciplinary knowledge. To address this problem, we introduce OCNet, a domain knowledge-enhanced representation learning framework, with which the efficient and accurate virtual characterization is made possible. Based on the SE(3) transformer architecture and a self-constructed large-scale conjugated molecular database with millions of structures and properties, OCNet realizes general molecular and bimolecular representation and supports the integration of domain knowledge features. In multiple optoelectronic property prediction tasks, OCNet shows a significant improvement in accuracy compared to previously reported models. It also constructs a DFT accuracy database for the transfer integrals of thin-film materials and renders the general prediction of such properties possible. With its user-friendly interface, OCNet can serve as an effective virtual characterization tool, facilitating the development of optoelectronic devices and other functional material research.
- Uni-Core, check its Installation Documentation.
- rdkit==2024.3.1, install via
pip install rdkit-pypi==2022.9.3. - xtb==6.7.1, install via
conda install xtb==6.7.1. - Multiwfn, check its Software Manual.
Download the processed dataset train.tar.gz and valid.lmdb from Pre-training molecular database and models of OCNet. Then, unzip train.tar.gz and copy the train.lmdb and valid.lmdb to ./molecular_properties/data/pretrain directory.
Download the processed dataset data.tar.gz from Pre-training bimolecular database and models of OCNet. Then, copy the data/train.lmdb and data/valid.lmdb to ./biomolecular_properties/data/pretrain directory.
Download the weights file checkpoint_best.pt from Pre-training molecular database and models of OCNet. Then copy the checkpoint_best.pt to molecular_properties/weight/pretrain.
Download the weights file checkpoint_best.pt from Pre-training bimolecular database and models of OCNet. Then copy the checkpoint_best.pt to bimolecular_properties/weight/pretrain.
The pre-training scripts and strategies are updated in progress.
1. Download the processed dataset from Downstream molecular models and properties of OCNet. Then, unzip the gas_phase_data.tar.gz and copy the gas_phase to ./molecular_properties/data.
HOMO-LUMO GAP: cd ./molecular_properties/code/gas_phase_and_solution/gap_scripts && bash train.sh
s0-s1 energy: cd ./molecular_properties/code/gas_phase_and_solution/s0s1_scripts && bash train.sh
Electronic reorganization energy: cd ./molecular_properties/code/gas_phase_and_solution/er_scripts && bash train.sh
Hole reorganization energy: cd ./molecular_properties/code/gas_phase_and_solution/hr_scripts && bash train.sh
1. Download the processed dataset from Downstream molecular models and properties of OCNet. Then, unzip the properties_in_solution_data.tar.gz and copy the properties_in_solution to ./molecular_properties/data.
Emission wavelength: cd ./molecular_properties/code/gas_phase_and_solution/emi_scripts && bash train.sh
Absorption wavelength: cd ./molecular_properties/code/gas_phase_and_solution/abs_scripts && bash train.sh
Full width at half maxima: cd ./molecular_properties/code/gas_phase_and_solution/fwhm_scripts && bash train.sh
Photoluminescence Quantum Yield: cd ./molecular_properties/code/gas_phase_and_solution/plqy_scripts && bash train.sh
1. Download the processed dataset from Downstream bimolecular models and properties of OCNet. Then, unzip the crystal_hh_data.tar.gz, crystal_ll_data.tar.gz, film_hh_data.tar.gz, and film_ll_data.tar.gz. Finally, copy the crystal_hh, crystal_ll, film_hh, film_ll to ./bimolecular_properties/data.
2. If you want to fine-tuning transfer integrals in crystal or thin film, you can run the following command:
Hole transfer integrals in crystal: cd ./biomolecular_properties/code/crystal_hh_scripts && bash train.sh
Electron transfer integrals in crystal: cd ./biomolecular_properties/code/crystal_ll_scripts && bash train.sh
Hole transfer integrals in film: cd ./biomolecular_properties/code/film_hh_scripts && bash train.sh
Electron transfer integrals in film: cd ./biomolecular_properties/code/film_ll_scripts && bash train.sh
1.Download the processed dataset from Downstream molecular models and properties of OCNet. Then, unzip the gas_phase_weight.tar.gz and copy gas_pahse to ./molecular_properties/weight.
HOMO-LUMO GAP: cd ./molecular_properties/code/gas_phase_and_solution/gap_scripts && bash infer.sh
s0-s1 energy: cd ./molecular_properties/code/gas_phase_and_solution/s0s1_scripts && bash infer.sh
Electronic reorganization energy: cd ./molecular_properties/code/gas_phase_and_solution/er_scripts && bash infer.sh
Hole reorganization energy: cd ./molecular_properties/code/gas_phase_and_solution/hr_scripts && bash infer.sh
1.Download the processed dataset from Downstream molecular models and properties of OCNet. Then, unzip the properties_in_solutioin_weight.tar.gz and copy properties_in_solution to ./molecular_properties/weight.
Emission wavelength: cd ./molecular_properties/code/gas_phase_and_solution/emi_scripts && bash infer.sh
Absorption wavelength: cd ./molecular_properties/code/gas_phase_and_solution/abs_scripts && bash infer.sh
Full width at half maxima: cd ./molecular_properties/code/gas_phase_and_solution/fwhm_scripts && bash infer.sh
Photoluminescence Quantum Yield: cd ./molecular_properties/code/gas_phase_and_solution/plqy_scripts && bash infer.sh
1. Download the processed dataset from Downstream bimolecular models and properties of OCNet. Then, unzip the crystal_hh_weight.tar.gz, crystal_ll_weight.tar.gz, film_hh_weight.tar.gz, and film_ll_weight.tar.gz. Finally, copy the crystal_hh, crystal_ll, film_hh, film_ll to ./bimolecular_properties/weight.
Hole transfer integrals in crystal: cd ./biomolecular_properties/code/crystal_hh_scripts && bash infer.sh
Electron transfer integrals in crystal: cd ./biomolecular_properties/code/crystal_ll_scripts && bash infer.sh
Hole transfer integrals in film: cd ./biomolecular_properties/code/film_hh_scripts && bash infer.sh
Electron transfer integrals in film: cd ./biomolecular_properties/code/film_ll_scripts && bash infer.sh
1. Download the processed dataset from Thin film structures and transfer integrations. Then, unzip the film_elec_mobility.zip. Finally, copy the film_elec_mobility to ./biomolecular_properties/data.
cd ./biomolecular_properties/code/film_ll_scripts_elec
&& python lmdb_convert.py mol_105511_mob && bash infer.sh
mobility calculated with the OCNet:cd ./biomolecular_properties/code/film_ll_scripts_elec
&& python mobility_film.py mol_105901_mob OCNet
mobility calculated with the QM method:cd ./biomolecular_properties/code/film_ll_scripts_elec
&& python mobility_film.py mol_105901_mob QM
mobility calculated with the xTB method:cd ./biomolecular_properties/code/film_ll_scripts_elec
&& python mobility_film.py mol_105901_mob xTB
cd ./molecular_properties/code/gas_phase_and_solution/pce_scripts && bash infer.sh
This project is licensed under the terms of the MIT license.
