We utilized the complete OCDB dataset to develop a machine learning model for predicting complexation free energy between guest organic molecules and cyclodextrins (CDs). The model's evaluation process involved partitioning the dataset into three subsets: 80% for training, 10% for validation, and the remaining 10% for testing.
The scatter plot below illustrates the predicted vs. observed complexation free energy specifically for the testing set, using the LightGBM model. To assess the predictive model's performance for cyclodextrins (CDs) complexation free energy, we employed several statistical metrics, including root mean square error (RMSE), mean absolute error (MAE), and the squared correlation coefficient (R2), which quantify the agreement between observed and predicted data points:
- RMSE = √[Σ(yiobs - yipred)² / n]
- MAE = Σ|yiobs - yipred| / n
- R2 = 1 - (Σ(yiobs - yipred)2) / (Σ(yiobs - yiobs,mean)2)