QSAR studies of BBR analogues against coxsackievirus B1

Coxsackievirus group B (CVBs) are common enteroviruses associated with several diseases from etiologically to inflammatory cardiomyopathies and constitute a severe cause of mortality in newborn resulting in severe meningitis, fulminant infection, myocarditis, and encephalitis. While Berberian (BBR) is an effective antivirus and possesses potentials of suppressing CVB replication, Zeng et al. explored a structural modification of BBR by incorporating a substituted primary amine enhance antiviral potency and safety. Based on data set from Zeng et al., we attempted to propose a QSAR model that can predict the bioactivity of unknown compounds as anti-CVB1. Among many descriptors, four were selected using the Genetic Functional Approximation (GFA). Internal and external validation was carried out on data set using statistical parameters. The QSAR model was seen to meet the minimum requirement with Lack of fit = 0.068744, R2 0.897, Adjusted R2 = 0.8627, cross-validated R2 = 0.76169, R2 predicted = 0.68. The predictive ability of the model was found to be satisfactory and could be used for designing a similar group of compounds.


Background
Coxsackievirus group B (CVB) is a member of the genus Enterovirus of Picornaviridae family responsible for many heart, liver, pleura, and pancreas infections (Harb et al. 2021). CVBs are an important cause of mortality in newborn and cause severe meningitis, fulminant infection, myocarditis, and encephalitis (Kaplan et al. 1983). Group B coxsackievirus has six serotypes (1-6). CVB infection can be inhibited by some CVB inhibitors to some degrees. Although type 3 of CVBs is mostly responsible for chronic myocarditis and dilated cardiomyopathy, there is no effective drug for treating or method for diagnosing CVB3 infections.
Berberian is a quaternary ammonium salt of an isoquinoline alkaloid and could be extracted from many plants (Gaba et al. 2021). Berberian (BBR) is often used in many Chinese herbal medicines for diarrhoea treatment (Gaba et al. 2021). BBR has been reported to an efficient antiviral against several viruses (Tillhon et al. 2012). BBR has been identified to suppress replication of CVB3 by suppressing c-Jun N-terminal kinase (JNK) and p38 MAPK activation (Dai et al. 2017).
In a recent study by Zeng, 28 BBR analogues were prepared by total or semi-synthesized routine while the activity of BBRs against all CVB serotypes explored by introducing various substituted amine hydrochlorides and tert-butyl carbamate on BBR core (Zeng et al. 2020). Structural modifications were performed to determine structural-activity relationship (SAR) of BBR against CVBs (Zeng et al. 2020). A combination of the halfmaximal inhibitory concentration (IC 50 ) and selectivity index (SI) was used to evaluate the potency against CVBs strains. The results showed that introducing the substituted amine on position 3 of BBR core yields higher anti-CVB (Zeng et al. 2020).
QSAR modelling introduced by Corwin Hansch is popularly used in chemistry, toxicology, and pharmaceutical chemistry (Cherkasov et al. 2014). Based on the data set from Zeng et al. (Zeng et al. 2020), we proposed a QSAR model to predict the bioactivity of unknown compounds using Genetic Function Approximation Multi-Linear Regression (GFA-MLR) Technique.

Methods
The structural modification carried out by Zeng et al. suggests that compounds with potent anti-coxsackievirus properties could be synthesized via Ligand-based drug design (LBDD). The overall objective of current study is to propose a QSAR model that can predict the bioactivity of BBR analogues against Coxsackie virus B1. The outline followed in developing QSAR model starts with selecting appropriate descriptors for the given structures of BBR analogues, building models to predict activity and validating proposed model ( Fig. 1).
Twenty-eight BBR analogues were prepared by introducing each amine hydrochlorides and tertbutyl carbamate on four positions (2, 3, 9 or 10) on the BBR cores (Table 1) to improve the bioavailability (Zeng et al. 2020). Out of the 28 compounds prepared, 24 compounds were randomly selected (which excludes four compounds: 4, 19, 26 and 27) and used for QSAR study. Bioactivities of selected compounds were obtained by taking a negative logarithm of the IC 50 (Table 1). The structures of the selected 24 BBR analogues in Table 1 were drawn with ChemDraw 20.0 (Ikwu et al. 2020). Sparta v14 was employed to optimize BBR structures via Density Functional Theory (DFT), while Becke Three Lee Yang Parr (B3LYP) correlation and 6-31G* basis set were adopted (Scalmani and Frisch 2010;Lee et al. 1988). Optimized structures were processed through PaDel descriptor software to generate the 1D, 2D and 3D molecular descriptors.
Optimized molecular descriptors were pre-treated using Drug Theoretical and Cheminformatics Laboratory (DTC Lab) to decrease collinearity and data redundance (Oyeneyin et al. 2021). Kennard and Stone's algorithm (Kennard and Stone 1969) was used to split the data set into 30% and 70% test and training set, respectively. Training data set was then subjected to Genetic Function Approximation (GFA) through Material Studio 2017 software to generate and internally validate four models. Multi-Linear Regression (MLR) was used to predict relationship between the bioactivity of the compounds (dependent variable) and their molecular descriptors (independent variable).
Built QSAR model was statistically analysed to validate its stability, reproducibility, reliability, and robustness. External validation of proposed model with test data set utilized equations provided in Ikwu et al. 2020. QSAR model predictive ability was then examined through variety of statistical approaches. Variance Inflation Factor (VIF) and Mean effect (ME) analysis clarified the dependability of the molecular descriptors on each other and highlight the descriptor with the highest effect. The applicability domain (AD) technique employed in this study is Leverage method.
The technique helps to understand influential and outlier descriptors.

Results
Four Molecular descriptors used in this study include averaged and centred moreau-broto autocorrelation of lag 7 weighted by ionization potential (AATSC7i), Maximum E-State descriptors of strength for potential Hydrogen Bonds of path length 10 (maxHBint10), Charge weighted partial positive surface area * total molecular surface area/1000 (WPSA-3), and Radial Distribution Function-110/ weighted by Sanderson electronegativity (RDF110e). The relationship between the descriptors was analysed using Pearson correlation (Table 2). In addition to a pairwise correlation between four molecular descriptors examined in Table 2, the Variance Inflation Factor (VIF) was deployed to probe the severity of multicollinearity in regression analysis (Table 3).
To examine the relative importance as well as the contribution of each of the four molecular descriptors considered in the model, the value of the mean effect (ME) was calculated for each descriptor. Its sign indicates the direction in the values of the activities resulting from either an increase or decrease in descriptor values.
Deploying Genetic Function Approximation Multi-Linear Regression (GFA-MLR) technique yielded four model equations for BBR activity (pIC 50 ) shown in Eqs. 1-4, while their respective validation parameters are presented in Table 3.
The respective structure for BBR compounds was presented with their experimental bioactivity, predicted bioactivity and the residual in Table 5. William's plot was used to show the relationship between the leverages and standardized residual and visualize the applicability domain (Fig. 3). Warning limit denoted as h* was used to identify compounds that are influential or outliers. Compounds with leverage value higher (1) pIC 50 = 3.647808630 × AATSC7i − 0.715291235 × maxHBint10 + 0.019923483 than the warning limit reduce confidence of the model and may be considered outliers or influential. The plot of standardized residual versus experimental activity (training and test set) is presented in Fig. 3.

Discussion
The four molecular descriptors that contributed to model development comprise of two 2D and two 3D descriptors ( Table 2). The relationship between the descriptors was analysed using Pearson correlation (Table 2). A large-to-medium positive correlation (r > 0.5) exists between AATSC7i, maxHBint10, and WPSA-3 (Table 2). Largest positive correlation (r = ~ 0.82) was recorded between AATSC7i, and maxHBint10 which are 2D descriptors. However, correlation results indicate negative correlation between RDF110e and other molecular descriptors. The weakest negative correlation (r = ~ 0.34) was displayed between the 3D descriptors; RDF110e and WPSA-3. Overall, 3D descriptors could affect efficiency of developed QSAR model used in predicting the BBR analogues independently, while the 2D descriptors mostly depend on each other for effectiveness. VIF above 4 could indicate existence of multicollinearity which may require further investigation (Marquaridt 1970). However, VIF higher than 10 is considered as an indicator for a significant multicollinearity requiring correction (Marquaridt 1970). While most molecular descriptors (AATSC7i, WPSA-3, and RDF110e) showed no multicollinearity (VIF < 4), maxHBint10 displayed some level of collinearity (VIF = 5.83). Overall, across four molecular descriptors considered for modelling, VIF measured was within a range of 1.83 to 5.83 indicating absence of any significant multicollinearity (VIF < 10).
Negative ME values for AATSC7i, maxHBint10, and RDF110e indicate that activity is inverse relationship with activity (Table 3). In contrast to other descriptors, positive ME value for WPSA-3 (ME = ~ 0.78) signifies a direct relationship with activity (Table 3). The significance of the descriptors is indicated by the magnitude of ME values as WPSA-3 > RDF110e > maxH-Bint10 > AATSC7i. Results show that WPSA-3 and RDF110e which are 3D descriptors contribute more significantly to the built model than the other two 2D descriptors-maxHBint10 and AATSC7i (Table 3). Findings reveal that 3D descriptors played the major role in building the model.
The critical significance of regression (SOR) F-value ~ 3.306 was applicable across all four model equations while considering a confidence interval (COI) of 95%. All four models show SOR F-values larger than the critical SOR F-value (Table 4). The best of the models was selected due to its ability to meet the minimum recommended parameter required for a QSAR model (Veerasamy et al. 2011). Across all four model equations generated, Eq. 1 records the largest adjusted R-squared value ~ 0.863 along with the  The correlation coefficient of the experimental bioactivity and the predicted activity R 2 is 0.8474 based on train and test set combined (Fig. 2). This implies that there is a strong correlation between the experimental activity and the predicted activity from the model. Based on test set, model yields a predicted correlation (R 2 pred ) of 0.68 which is greater than the minimum recommended value (R 2 pred > 0.6) for a successful QSAR model (Veerasamy et al. 2011).
Results show that three compounds from test set yielded leverage higher than the warning limit of 0.88 (Fig. 3). The three compounds 12, 15 and 23 (Table 1) found outside the warning limit are considered influential as they might possess fewer of the major chemical descriptors present in other compounds within the overall data set.
Standardized residual versus experimental activity for test and train data set (Fig. 4) reveals that residual values of the bioactivities of the compounds used for the study are evenly distributed between y-axis (not totally positive or negative values) which makes the model free from bias or systematic error. Standardized residual recorded for this model falls within ± 3 which is a common threshold for accepting predictions (Jaworska et al. 2005).

Conclusion
While BBR presents an alternative pathway to suppressing CVB replication, a structural modification of BBR can further enhance the antiviral potency and safety of BBR. In a bid to enhance potency and safety of BBR in CVB treatment, Zeng et al. deployed a strategy that involved an incorporation of a primary amine hydrochlorides on position 3 of BBR core. Promising results from Zeng et al. provide data set from which activity of formulated BBR could be predicted using QSAR.
A QSAR study of 24 BBR analogues was performed based on the theoretical molecular descriptors calculated by the PaDel descriptor software and selected by GFA. Developed model passed through an internal and external validation to indicate a robust and satisfactory model, while four major descriptors were considered as structural features responsible for activity.
Our findings show that the activity of the studied compounds mainly depends mostly on 3D descriptors with level of significance as WPSA-3 > RDF110e > max-HBint10 > AATSC7i. The QSAR model developed in current study can predict the activity of new compounds and help to design new compounds with improved activity.

Appendix
See Table 5.