Skip to main content

Table 2 Some equations and parameters used for the model validation

From: Theoretical activity prediction, structure-based design, molecular docking and pharmacokinetic studies of some maleimides against Leishmania donovani for the treatment of leishmaniasis

Parameter Equation Eq Significance Threshold value
Internal validation     
Friedman Lack-Of-Fit (LOF) \({\text{LOF}} = \frac{{{\text{SEE}}}}{{\left( {1 - \frac{c + d \times p}{M}} \right)^{2} }}\)
\({\text{SEE}} = \sqrt {\frac{{\left( {Y_{\exp } - Y_{{{\text{pred}}}} } \right)^{2} }}{N - P - 1}}\)
4 Allows for the best fitness score to be obtained
Correlation coefficient (R2) \(R^{2} = 1 - \left[ {\frac{{\sum \left( {Y_{\exp } - Y_{{{\text{pred}}}} } \right)^{2} }}{{\sum \left( {Y_{\exp } - \overline{Y}_{{{\text{training}}}} } \right)^{2} }}} \right]\) 5 Measures the degree of fitness of the regression equation  ≥ 0.6
Adjusted R2 \(R_{{{\text{adj}}}}^{2} = \frac{{R^{2} - p\left( {n - 1} \right)}}{n - p + 1}\) 6 Ensures model’s stability and reliability  ≥ 0.5
Cross-validation regression coefficient (Q2cv) \(Q_{{{\text{cv}}}}^{2} = 1 - \left[ {\frac{{\sum \left( {Y_{{{\text{pred}}}} - Y_{\exp } } \right)^{2} }}{{\sum (Y_{\exp } - \overline{Y}_{{{\text{training}}}} )^{2} }}} \right]\) 7 Indicates a high internal predictive power  ≥ 0.5
The coefficient of determination (\(cR_{{\text{p}}}^{2}\)) of Y-Randomization \(cR_{{\text{p}}}^{2} = R X [R^{2} - \left( {R_{r} } \right)^{2} ]^{2}\) 8 This is for a confirmation that the QSAR model built is strong and not created by chance \(cR_{{\text{p}}}^{2}\) > 0.50
External validation  
Predicted R2 (R2 test) \(R_{{{\text{test}}}}^{2} = 1 - \frac{{\sum \left( {Y{\text{pred}}_{{{\text{test}}}} - Y\exp_{{{\text{test}}}} } \right)^{2} }}{{\sum \left( {Y{\text{pred}}_{{{\text{test}}}} - \overline{Y}_{{{\text{training}}}} } \right)^{2} }}\) 9 Measures the ability of the model to predict activity values of external set of compounds  ≥ 0.6
Golbraikh and Tropsha acceptable model criteria \(\left| {r_{o}^{2} - r_{o}^{^{\prime}2} } \right|\)
\(\left| {r^{2} - \frac{{r_{o}^{{{^{\prime}}2}} }}{{r^{2} }}} \right|\)
kʹ (threshold value)
Assess the robustness and stability of the model  < 0.3
 < 0.1
0.85 ≤ k′ ≤ 1.15
  1. SEE Standard error of estimation; c number of terms in the model; d user-defined smoothing parameter, p total number of descriptors in the model, M number of data in the training set, \(\overline{Y}\) training = mean experimental activity of the training set, Yexp experimental activity in the training set, Ypred predicted activity in the training set, n number of compounds in the training set, \(cR_{{\text{p}}}^{2}\) Y-randomization coefficient, R correlation coefficient for Y-Randomization, Rr average ‘R’ of random models, Ypredtest predicted activity of test set, Yexptest experimental activity of test set, r2 square correlation coefficients of the plot of experimental activity versus predicted activity values, ro2 square correlation coefficients of the plot of experimental activity versus predicted activity values at zero intercept, rʹo2 square correlation coefficients of the plot of predicted activity versus experimental activity at zero intercept, kʹ slope of the plot of predicted activity against experimental activity at zero intercept