Quickly predict ΔG (or K) by ML model.

T(K)
pH

Chemical Name SMILES (Standard) InChI (Standard) InChIKey
furan | formaldehyde c1ccoc1 | CC(C)=O InChI=1S/C4H4O/c1-2-4-5-3-1/h1-4H InChIKey=YLQBMQCUIZJEEH-UHFFFAOYSA-N
T(K)
pH
  1. Draw the structure of the desired molecule.

About the Machine Learning (ML) Model

We utilized the complete OCDB dataset to develop a machine learning model for predicting complexation free energy between guest organic molecules and cyclodextrins (CDs). The model's evaluation process involved partitioning the dataset into three subsets: 80% for training, 10% for validation, and the remaining 10% for testing. The scatter plot below illustrates the predicted vs. observed complexation free energy specifically for the testing set, using the LightGBM model. To assess the predictive model's performance for cyclodextrins (CDs) complexation free energy, we employed several statistical metrics, including root mean square error (RMSE), mean absolute error (MAE), and the squared correlation coefficient (R2), which quantify the agreement between observed and predicted data points:

  • RMSE = √[Σ(yiobs - yipred)² / n]
  • MAE = Σ|yiobs - yipred| / n
  • R2 = 1 - (Σ(yiobs - yipred)2) / (Σ(yiobs - yiobs,mean)2)

Welcome image

The RMSE and MAE on the testing set were 2.29 and 1.64 kJ.mol-1, respectively, with a correlation coefficient (R2) value of 0.77. The results indicate that the LightGBM model performed well on the OCDB dataset.


Predicted vs. observed complexation free energy by LightGBM model.