Skip to main content

Table 2 Some selected equations and parameters for validation of the QSAR model

From: A combined 2-D and 3-D QSAR modeling, molecular docking study, design, and pharmacokinetic profiling of some arylimidamide-azole hybrids as superior L. donovani inhibitors

Parameter

Equation

Eq

Significance

Threshold value

Internal validation

Friedman lack of fit (LOF)

\(\mathrm{LOF}=\frac{\mathrm{SEE}}{{(1-\frac{\mathrm{c}+\mathrm{d x p}}{\mathrm{M}})}^{2}}\)

(3)

Allows for the best fitness score to be obtained

–

 

\(SEE=\sqrt{\frac{{({Y}_{exp}-{Y}_{pred})}^{2}}{N-P-1}}\)

   

Correlation coefficient ( R2)

\(R^{2} = 1 - \left[ {\frac{{\sum {(\user2{Y}_{{exp}} - Y_{{pred}} )^{2} } }}{{\sum {(Y_{{exp}} - \bar{Y}_{{training}} )^{2} } }}} \right]\)

(4)

Measures the degree of fitness of the regression equation

 ≥ 0.6

Adjusted R2

\({R}_{adj}^{2}=\frac{{R}^{2}-p(n-1)}{n-p+1}\)

(5)

Ensures the model’s stability and reliability

 ≥ 0.5

Cross-validation regression coefficient ( Q2cv)

\(Q_{{cv}}^{2} = 1 - \left[ {\frac{{\sum {(Y_{{pred}} - Y_{{exp}} )^{2} } }}{{\sum (Y_{{exp}} - \bar{Y}_{{training}} )^{2} }}} \right]\)

(6)

Indicates a high internal predictive power

 ≥ 0.5

The coefficient of determination (\({cR}_{p}^{2}\)) of Y-Randomization

\({cR}_{p}^{2}=R X [{R}^{2}-{\left({R}_{r}\right)}^{2}{]}^{2}\)

(7)

This is for confirmation that the QSAR model built is strong and not created by chance

\({cR}_{p}^{2}\)> 0.50

External validation

Predicted R2 ( R2 test)

\(R_{{test}}^{2} = 1 - \frac{{\sum {(Ypred_{{test}} - Yexp_{{test}} )^{2} } }}{{\sum {(Ypred_{{test}} - \bar{Y}_{{training}} )^{2} } }}\)

(8)

Measures the ability of the model to predict activity values of an external set of compounds

 ≥ 0.6

Golbraikh and Tropsha’s acceptable model criteria

\(|r_{o}^{2} - r_{o}^{{\prime 2}} |\)

–

Assess the robustness and stability of the model

 < 0.3

 

\([{(r}^{2}-{r}_{o}^{2})/{r}^{2}]\)

  

 < 0.1

 

k

  

0.85 ≤ k ≤ 1.15

  1. SEE = Standard error of estimation, c = number of terms in the model, d = user-defined smoothing parameter, p = total number of descriptors in the model, M = number of data in the training set, Ῡtraining = mean experimental activity of the training set, Yexp = experimental activity in the training set, Ypred = predicted activity in the training set, n = number of compounds in the training set., \({cR}_{p}^{2}\) = Y-randomization coefficient, R = correlation coefficient for Y-randomization, Rr = average ‘R’ of random models. Ypredtest = predicted activity of test set, Yexptest = experimental activity of test set, r2 = square correlation coefficients of the plot of predicted activity versus experimental activity values. ro2 = square correlation coefficients of the plot of predicted activity versus experimental activity values at zero intercept. r′o2 = square correlation coefficients of the plot of experimental activity versus predicted activity at zero intercept. k = slope of the plot of predicted activity against experimental activity at zero intercept