# Table 2 Some equations and parameters used for the model validation

Parameter

Equation

Eq

Significance

Threshold value

Internal validation

Friedman Lack-Of-Fit (LOF)

$${\text{LOF}} = \frac{{{\text{SEE}}}}{{\left( {1 - \frac{c + d \times p}{M}} \right)^{2} }}$$

$${\text{SEE}} = \sqrt {\frac{{\left( {Y_{\exp } - Y_{{{\text{pred}}}} } \right)^{2} }}{N - P - 1}}$$

4

Allows for the best fitness score to be obtained

Correlation coefficient (R2)

$$R^{2} = 1 - \left[ {\frac{{\sum \left( {Y_{\exp } - Y_{{{\text{pred}}}} } \right)^{2} }}{{\sum \left( {Y_{\exp } - \overline{Y}_{{{\text{training}}}} } \right)^{2} }}} \right]$$

5

Measures the degree of fitness of the regression equation

≥ 0.6

$$R_{{{\text{adj}}}}^{2} = \frac{{R^{2} - p\left( {n - 1} \right)}}{n - p + 1}$$

6

Ensures model’s stability and reliability

≥ 0.5

Cross-validation regression coefficient (Q2cv)

$$Q_{{{\text{cv}}}}^{2} = 1 - \left[ {\frac{{\sum \left( {Y_{{{\text{pred}}}} - Y_{\exp } } \right)^{2} }}{{\sum (Y_{\exp } - \overline{Y}_{{{\text{training}}}} )^{2} }}} \right]$$

7

Indicates a high internal predictive power

≥ 0.5

The coefficient of determination ($$cR_{{\text{p}}}^{2}$$) of Y-Randomization

$$cR_{{\text{p}}}^{2} = R X [R^{2} - \left( {R_{r} } \right)^{2} ]^{2}$$

8

This is for a confirmation that the QSAR model built is strong and not created by chance

$$cR_{{\text{p}}}^{2}$$ > 0.50

External validation

Predicted R2 (R2 test)

$$R_{{{\text{test}}}}^{2} = 1 - \frac{{\sum \left( {Y{\text{pred}}_{{{\text{test}}}} - Y\exp_{{{\text{test}}}} } \right)^{2} }}{{\sum \left( {Y{\text{pred}}_{{{\text{test}}}} - \overline{Y}_{{{\text{training}}}} } \right)^{2} }}$$

9

Measures the ability of the model to predict activity values of external set of compounds

≥ 0.6

Golbraikh and Tropsha acceptable model criteria

$$\left| {r_{o}^{2} - r_{o}^{^{\prime}2} } \right|$$

$$\left| {r^{2} - \frac{{r_{o}^{{{^{\prime}}2}} }}{{r^{2} }}} \right|$$

kʹ (threshold value)

Assess the robustness and stability of the model

< 0.3

< 0.1

0.85 ≤ k′ ≤ 1.15

1. SEE Standard error of estimation; c number of terms in the model; d user-defined smoothing parameter, p total number of descriptors in the model, M number of data in the training set, $$\overline{Y}$$ training = mean experimental activity of the training set, Yexp experimental activity in the training set, Ypred predicted activity in the training set, n number of compounds in the training set, $$cR_{{\text{p}}}^{2}$$ Y-randomization coefficient, R correlation coefficient for Y-Randomization, Rr average ‘R’ of random models, Ypredtest predicted activity of test set, Yexptest experimental activity of test set, r2 square correlation coefficients of the plot of experimental activity versus predicted activity values, ro2 square correlation coefficients of the plot of experimental activity versus predicted activity values at zero intercept, rʹo2 square correlation coefficients of the plot of predicted activity versus experimental activity at zero intercept, kʹ slope of the plot of predicted activity against experimental activity at zero intercept 