# Table 6 Validation parameters for each model using multilinear regression (MLR)

S/NO Validation parameters Formula Threshold Model
Internal validation
1 Friedman lack of fit (LOF) $$\frac{\mathrm{SEE}}{{\left(1-\frac{w+q\times j}{N}\right)}^2}$$ Significantly low 0.1802
2 R-squared $$1-\left[\frac{\sum {\left({Y}_{\mathrm{obs}\kern0.5em -{Y}_{\mathrm{pred}}}\right)}^2}{\sum {\left({Y}_{\mathrm{obs}\kern0.5em -{\overline{Y}}_{\mathrm{training}}}\right)}^2}\right]$$ R2 > 0.6 0.7759
3 Adjusted R-squared $$\frac{R^2-P\ \left(N-1\right)}{N-p+1}$$ $${R}_{\mathrm{adj}}^2>0.6$$ 07381
4 Cross-validated R-squared ($${Q}_{cv}^2\Big)$$ $$1-\left[\frac{\sum {\left({Y}_{\mathrm{pred}\kern0.5em -{Y}_{\mathrm{obs}}}\right)}^2}{\sum {\left({Y}_{\mathrm{obs}\kern0.5em -{\overline{Y}}_{\mathrm{training}}}\right)}^2}\right]$$ Q2 > 0.6 0.6954
5 Significant regression    Yes
6 Significance-of-regression F value    13.42
7 Critical SOR F value (95%) $$\frac{\sum {\left({Y}_{\mathrm{pred}\kern0.5em -{Y}_{\mathrm{obs}}}\right)}^2}{p}/\frac{\sum {\left({Y}_{\mathrm{pred}\kern0.5em -{Y}_{\mathrm{obs}}}\right)}^2}{N-p-1}$$ F(test) > 2.09 2.7294
8 Replicate points    0
9 Computed observed error    0
10 Min expt. error for non-significant LOF (95%)    0.4120
Model randomization
11 Average of the correlation coefficient for randomized data ($${\overline{\boldsymbol{R}}}_{\boldsymbol{r}}$$)   $$\overline{R}<0.5$$ 0.3642
12 Average of determination coefficient for randomized data ($${\overline{\boldsymbol{R}}}_{\boldsymbol{r}}^{\mathbf{2}}\Big)$$   $${\overline{R}}_r^2<0.5$$ 0.1823
13 Average of leave one out cross-validated determination coefficient for randomized data ( $${\overline{\boldsymbol{Q}}}_{\boldsymbol{r}}^{\mathbf{2}}$$ )   $${\overline{Q}}_r^2<0.5$$ − 0.3915
14 Coefficient for Y-randomization (c$${R}_p^2\Big)$$ $${R}^2\times \left(1-\sqrt{\left|{R}^2-{\overline{R}}_{\mathrm{r}}^2\right|}\ \right)$$ c$${R}_p^2>0.6$$ 0.9229
External validation
15 $$/{\boldsymbol{r}}_{\mathbf{0}}^{\mathbf{2}}-{{\boldsymbol{r}}^{\prime}}_{\mathbf{0}}^{\mathbf{2}}/$$   < 0.3 0.1591
16 $$\frac{{\boldsymbol{r}}^{\mathbf{2}}-{\boldsymbol{r}}_{\mathbf{0}}^{\mathbf{2}}}{{\boldsymbol{r}}^{\mathbf{2}}}$$   < 0.1 0.0023
17 $$\frac{{\boldsymbol{r}}^{\mathbf{2}}-{{\boldsymbol{r}}^{\prime}}_{\mathbf{0}}^{\mathbf{2}}}{{\boldsymbol{r}}^{\mathbf{2}}}$$   < 0.1 0.0136
18 $${\boldsymbol{R}}_{\mathbf{test}}^{\mathbf{2}}$$ $${R}_{test}^2=1-\frac{\sum {\left(Y{\mathrm{pred}}_{\mathrm{test}}-{Y}_{{\mathrm{obs}}_{\mathrm{test}}}\right)}^2}{\sum {\left(Y{\mathrm{pred}}_{\mathrm{test}}-{\overline{Y}}_{\mathrm{training}}\ \right)}^2}$$ >0.6 0.6550
1. SEE is the standard error of estimation, w is the total number of terms present in the built model except the constant term, j is the number of descriptors confined in the built model, q is a user-defined factor, and N is the number of compounds of training set. Yobs, $${\overline{Y}}_{\mathrm{training}}$$, and Ypred are the observed activity, the mean observed activity of the training compounds, and the predicted activity, respectively. r2 is the correlation coefficients of the plot of observed activity against predicted activity values, ro2 is the correlation coefficients of the plot of observed activity against predicted activity values at zero intercept, and ro2 is the correlation coefficients of the plot of predicted activity against observed activity at zero intercept (Adeniji et al. 2020a; Roy et al. 2011; Adeniji et al. 2020d)