OCdb - Open Cyclo. data base

Chemical Name	SMILES	(Standard) InChI	(Standard) InChIKey
furan \| formaldehyde	c1ccoc1 \| CC(C)=O	InChI=1S/C4H4O/c1-2-4-5-3-1/h1-4H	InChIKey=YLQBMQCUIZJEEH-UHFFFAOYSA-N

We utilized the complete OCDB dataset to develop a machine learning model for predicting complexation free energy between guest organic molecules and cyclodextrins (CDs). The model's evaluation process involved partitioning the dataset into three subsets: 80% for training, 10% for validation, and the remaining 10% for testing. The scatter plot below illustrates the predicted vs. observed complexation free energy specifically for the testing set, using the LightGBM model. To assess the predictive model's performance for cyclodextrins (CDs) complexation free energy, we employed several statistical metrics, including root mean square error (RMSE), mean absolute error (MAE), and the squared correlation coefficient (R²), which quantify the agreement between observed and predicted data points:

RMSE = √[Σ(y_i^obs - y_i^pred)² / n]
MAE = Σ|y_i^obs - y_i^pred| / n
R² = 1 - (Σ(y_i^obs - y_i^pred)²) / (Σ(y_i^obs - y_i^obs,mean)²)

The RMSE and MAE on the testing set were 2.29 and 1.64 kJ.mol^-1, respectively, with a correlation coefficient (R²) value of 0.77. The results indicate that the LightGBM model performed well on the OCDB dataset.

Quickly predict ΔG (or K) by ML model.

About the Machine Learning (ML) Model

Predicted vs. observed complexation free energy by LightGBM model.