Computational investigation, virtual docking simulation of 1, 2, 4-Triazole analogues and insillico design of new proposed agents against protein target (3IFZ) binding domain

The reoccurrence of the resistant strains of Mycobacterium tuberculosis to available drugs/medications has mandated for the development of more effective anti-tubercular agents with efficient activities. Therefore, this work utilized the application of modeling technique to predict the inhibition activities of some prominent compounds which been reported to be efficient against M. tuberculosis. To accomplish the purpose of this work, multiple regression and genetic function approximation were adopted to create the model. The established model was swayed with topological descriptors: MATS7s, SM1_DzZ, TDB3v, and RDF70v. More also, interactions between the compounds and the target “DNA gyrase” were evaluated via docking approach utilizing the PyRx and Discovery Studio simulated software. Meanwhile, compound 19 has the most perceptible binding affinity of − 16.5 kcal/mol. Consequently, compound 19 served as a reference structural template and insight to design twelve novel hypothetical agents with more competent activities. Meanwhile, compound 19h was observed with high activity among the designed compounds with more prominent binding affinities of − 21.6 kcal/mol. Therefore, this research recommends in vivo, in vitro screening and pharmacokinetic properties to be carried out in order to determine the toxicity of the designed compounds.


Background
The World Health Organization (WHO) has declared tuberculosis as a major health issue to date. Despite its descending trend in prevalence and occurrence, new cases were still reported by every continent particularly in Southeast Asia and Africa. The WHO, in 2017, reported 9 million of who get infected and mortality rate of 1.6 million people globally (W.H.O 2018).
Anti-tubercular drugs recommended for treating tuberculosis include the following: rifampicin, pyrazinamide, para-aminosalicylic acid, and isoniazide (Adeniji et al. 2020a). However, reports have shown that patients do not respond positively to the administered drugs due to the resistance strain of Mycobacterium tuberculosis toward the current drugs. More also, most of these drugs have been reported with adverse side effect (Adeniji et al. 2020a). Therefore, the pursuit of novel anti-tubercular agents with enhanced and efficient properties/activities with minimum side effects against M. tuberculosis still remains a challenge to pharmacist and chemists (Adeniji et al. 2020b).
A type II topoisomerase target "DNA gyrase (3IFZ)" is present in all bacteria. It produces negative supercoils for the whole bacterial chromosome which relaxes the supercoils that generate the translocating RNA polymerase which shortened the chromosome for appropriate segregation during cell division (James 2009;Huang et al. 2006). This enzyme is a tetramer that is made up of "two subunits A" which comprises the DNA binding domain and "two subunits B" which catalyzes the reaction that quickly cleaves two DNA strands which depend on ATP hydrolysis. The two subunits A and B, i.e., GyrA and GyrB, aid the DNA replication by breaking and reuniting the DNA strand. Based on the function stated, the termination of the DNA replication can be blocked by prominent inhibitors targeting either the GyrA (DNA domain) or the GyrB (ATP binding cavities).
Heterocyclic molecules have been conventionally developed and established to play vital roles in medicinal applications due to their structural entities . Triazole and its analogue among all other heterocyclic compounds are being considered in pharmacological fields due to its unique structure and properties (Adeniji et al. 2020b;Zhang et al. 2006). Triazole is a five-membered ring heterocyclic diunsaturated compound composed of two carbon atoms at and three nitrogen atoms at non-adjacent positions, respectively. Recent researches have shown that triazole nucleus has gained huge attention among pharmacist, biochemist, biologist, and chemists as it is one of the major bioactive molecules in pharmaceuticals particularly in drug design and chemotherapeutical (Holla et al. 2005). Triazole has been reported to show substantial and extensive kind of pharmacological activities such as analgesic and antitubercular (https://patents.justia.com/patent/8865910, (Hafez et al. 2008)), anti-neoplastic (Guan et al. 2007), and anti-malarial (Gujjar et al. 2009). It is also reported as the most efficient molecules toward anti-TB activity (Patel et al. 2010).
For the time being, advancement of computational chemistry led to new challenges of drug discovery. Computational chemistry has made in silico methods to become widely used in the field of structure-based drug design which reduces the cost for effective evaluation of large virtual database of chemical compounds. Such computational method includes quantitative structureactivity relationships (QSARs) and molecular docking (Adeniji et al. 2020a).
The first stage is to design and synthesis novel hypothetical compounds with enhanced anti-tubercular activity and less toxicity/side effect with the approaches and methods that will consider the rate of experimental runs and time factor. Reference to the design of novel drug candidate, computer-aided drug design, has demonstrated a crucial part for the discovery of new molecules in pharmaceutical design, drug metabolism, and medicinal chemistry (Adeniji et al. 2019). This approach had facilitated the improvement in the course of optimization of chemical structures with well-defined purposes (Adeniji et al. 2020c). Quantitative structureactivity relationship study and molecular docking are one of the computer-aided drug design approaches which had been broadly utilized in the design, improvement, and synthesis of first-hand drug (Adeniji et al. 2020a). QSAR investigation had shown to be an expedient technique for forecasting biological/inhibition activities, properties of any chemical compound by making use of an experimental data, and molecular descriptors. This idea is based on the correlation between the information derived from any chemical space or structural molecule illustrated by the descriptor and well-defined experimental data provided. Meanwhile, molecular docking technique helps to foresee the binding location and affinity of the existing interaction between the molecule (ligand) and the target, thereby providing an idea to design a prospective drug with better activity against the target (Adeniji et al. 2020a). Therefore, the study aimed to carry out computational investigation, virtual docking simulation of 1,2,4-triazole analogues, and in silico design of new proposed compounds against DNA gyrase.

Collection of dataset
Forty molecules comprising the analogues of 1,2,4-triazole reported as anti-tubercular agents that were used in this study were acquired from the literature (https:// patents.justia.com/patent/8865910). The general structure of analogues of 1,2,4-triazole and the predicted and experimental activities of these compounds were presented in Fig. 1 and Table 1, respectively.

Pretreatment of calculated descriptors and splitting of dataset
All calculated descriptors were screened using a pretreatment 1.2 software so as to eliminate redundant and descriptors with less information in order to build an optimum model with high predictability (Adeniji et al. 2020c). Meanwhile, Kennard and Stone's algorithm method available in Data-division 1.2 software was employed to split the data into modeling dataset (training set) and validation dataset (test set ) with a ratio of 7:3, i.e., 70 to 30%. Model construction was executed on the training set while the validity and confirmation were checked on the test set.

Construction of QSAR models and validation test
Construction of optimum model that could serve as a tool for predicting reported experimental biodata and also serve as a tool to design novel compound was developed using the genetic function approximation approach. This technique randomly selected combined descriptors that could give a good prediction of the dataset. To generate the model in linear equation form, the idea of multiple-linear regression was adopted to generate the multivariant equation which was executed in the Material Studio software version 8.0 and also to assess the internal validation of the built model.

Leverage measure (applicability domain)
Leverage (hi) values for the dataset that made up the studied compound were calculated in order to define the chemical space (applicability domain) of the model built. Graphical interpretation of the leverage value for each of the compound plotted against their respective standardized residual is described as Williams' plot. The diagonal of the hat matrix element is termed the leverage calculated for both the training and test sets; meanwhile, the standardized residual is the validated residual estimated between predicted and reported experiment activities for both the training and test sets. The leverage (hi) was calculated using Eq. 1 which was used to check for outlier compound at a defined space (applicability domain boundary) of ± 3 (Adeniji et al. 2020b): N j denotes the matrix of i for the modeling set (training data). N denotes the m × d matrix for the training data, and N T is the transpose of the training data (N). N T j denotes the transpose matrix N j . In order to evaluate for an influential molecule, the warning leverage h * defined in Eq. 2 was calculated to define the limit boundary: where d and N denoted the number of descriptors and the number of training data (Adeniji et al. 2020b).
Superscript "a" represents the test set. The calculated activity (pA) is generated using the QSAR model built in this study. The residual values are the difference between the observed activity (pA) and calculated activity (pA). Leverage value for each compound represents the diagonal element of the hat matrix which defines the applicability domain space of the each compound

Assessment of Y-randomization test
Another criterion to be considered while establishing a built model is the Y-randomization test. This assessment is an external validation test which was achieved by random shuffling on the training data. (Adeniji et al. 2020a;Roy et al. 2011;Adeniji et al. 2020d). In order to create the multiple-linear regression model, descriptors, i.e., independent variables, which are the independent variables were kept untouched while the biological activities, i.e., dependent variable, were shuffled. To establish that the created model is not accidentally obtained, the R 2 and Q 2 values for the built model must be relatively low for many trials. More also, the Y-randomization coefficient (c R 2 p Þ presented in Eq. 3 must be ≥ 0.5 so as to affirm that the model is robust: R r is the average of R of random models, and R is the correlation coefficient (Adeniji et al. 2020d;Tropsha et al. 2003).

Verification and confirmation of the built model
The constructed model was subjected to various statistical tools to verify the potency of the model. Moreover, the internal and external threshold values have been laid to ascertain and affirm any kind of built model for validation as reported in Table 6 (Adeniji et al. 2020a;Roy et al. 2011;Adeniji et al. 2020d;Tropsha et al. 2003;Adeniji et al. 2018). Therefore, both the internal and external tests reported in this work were compared and verified with the generally accepted threshold value to ascertain the potency and the robustness of the built model.

Molecular docking procedure
Receptor-ligand docking interaction was carried out to know the binding affinity and to ascertain the possible ligand binding sites. The docking simulations were achieved using the AutoDock4.2 incorporated in the PyRx software. The targeted enzyme (DNA gyrase) in the form of crystal structure with PDB code 31FZ was     SEE is the standard error of estimation, w is the total number of terms present in the built model except the constant term, j is the number of descriptors confined in the built model, q is a user-defined factor, and N is the number of compounds of training set. Y obs , Y training , and Y pred are the observed activity, the mean observed activity of the training compounds, and the predicted activity, respectively. r 2 is the correlation coefficients of the plot of observed activity against predicted activity values, r o 2 is the correlation coefficients of the plot of observed activity against predicted activity values at zero intercept, and r′ o 2 is the correlation coefficients of the plot of predicted activity against observed activity at zero intercept ( retrieved from a protein data bank (Piton et al. 2010;Piton et al. n.d.). With the Discovery Studio Visualizer software, all forms of solvent molecules, ligands, and cofactors imported with the enzyme were removed in order to achieve good binding interactions between the enzyme (protein) and the ligands (molecules). Thereafter, the enzyme protein was saved in PDB format which is recognized by the PyRx software and transformed as macromolecule (Adeniji et al. 2020a;Adeniji et al. 2018). Optimum conformation of the ligands (1,2,4-triazole derivatives) at minimum energy to enhance efficient binding interaction with the enzyme was achieved using Spartan 14 software as an optimized tool utilizing density functional theory [DFT (B3LYP/631G*)]. Thereafter, all the ligands optimized were saved as PDB format which is also recognized by the PyRx software and transformed as micromolecules (Adeniji et al. 2020a;Adeniji et al. 2018). In the PyRx software, the docking interaction between the targeted enzyme and the protein was then computed to evaluate the binding affinities while the interaction types such as hydrogen bonding, electrostatic interaction, and hydrophobic interaction were visualized and analyzed using the Discovery Studio Visualizer 16 software (Adeniji et al. 2018;Ibrahim et al. 2020).

Procedure for the in silico design of novel triazole derivatives
Substitution, elimination, and addition techniques were employed to design some proposed anti-tubercular agents with enhanced activities via modification of the template structure (compound 19) using the approach of ligand-based design (Adeniji et al. 2020a;Adeniji et al. 2020b). The template was selected as the reference compound and backbone to design new promising compounds due to its prominent activity values found within the applicability domain phase. The discovery of the new compounds was successfully achieved based on the information derived for the computed mean effect on the descriptor with high influence on the biological activities.

Results
Proposed QSAR model

Discussion
Dataset comprises 1,2,4-triazole, and its analogue against M. tuberculosis was successfully split into 28 modeling datasets (training set) and 12 validation datasets (test set) with the algorithm laid by Kennard and Stone (Adeniji et al. 2020d). The 28 training set data was used to construct the genetic functional algorithm using the multilinear regression technique as a model equation.
The analysis of the derived genetic functional algorithm model explores the physicochemical and structural influence of the studied compounds with their respective anti-tubercular activities. The derived model was established with geometrical and topological descriptors: MATS7s, SM1_DzZ, TDB3v, and RDF70v which influenced the model with relevant information and contribution as presented in Table 2. Meanwhile, all the calculated descriptors for the whole compounds for the purpose of validity and reproducibility are reported in Table 3. These descriptors were identified and correlated with anti-tubercular activity values.
Pearson's correlation statistics and variance inflation factor (VIF) were utilized to validate the descriptors in the proposed model (Adeniji et al. 2020b;Adeniji et al. 2020c;Roy et al. 2011;Adeniji et al. 2020d). Pearson's correlation statistics investigated between each descriptor were all < ± 0.8 as presented in Table 4 which signifies that no multicollinearity was found between each pair of descriptors. Interestingly, all the calculated values for the VIF reported in Table 5 are less than 10 which implies there is no inter-correlation existence for each descriptor. However, if the reported VIF value is > 10, this shows that the proposed model is likely unstable; therefore, the model needs to be rechecked (Adeniji et al. 2020a;Adeniji et al. 2020b;Adeniji et al. 2020c). Meanwhile, the results of the VIF are in full agreement with Pearson's correlation statistics. More also, another statistical parameter, i.e., one way analysis of variance (ANOVA), was computed to evaluate the significant correlation between the antitubercular activities and the descriptors at 95% confidence level. The probability values reported in Table 5 are found to be (p < 0.05) for each of the descriptors. This signifies that the null hypothesis suggesting no correlation between anti-tubercular activities and the descriptors in the proposed model is annulled. Thus, the alternative hypothesis proposing a significant correlation between the anti-tubercular activities and the descriptor is accepted (Adeniji et al. 2020d).
The relative direction, importance, and contribution of each descriptor in the proposed model were computed  using the mean effect (ME) approach as reported in Table 5. The value of the mean effect calculated for each descriptor in a model suggests the contribution that each descriptors plays in the model while the sign signifies the direction at which the descriptor influences the anti-tubercular activities (Adeniji et al. 2020b;Adeniji et al. 2020d). The proposed model was validated internally by squared correlation coefficient (R 2 = 0.7759), adjusted squared correlation coefficient (R 2 adj ¼ 0:7381), and leave one out cross-validation squared correlation coefficient (Q 2 = 0.6954) values (Table 6). Meanwhile, the robustness and fitness of the constructed models were ascertained by Y-randomization coefficient (cR 2 p ¼ 0:9229) as reported in Table 7 to strongly affirm that the proposed model is not derived by chance. Externally, the proposed model was also cross-validated with significant  ) which all met the threshold requirement for accepting any proposed model reported in Table 6. The proposed QSAR model and the findings gotten in this study were compared with the recent model established in the literature as presented below (Adeniji et al. 2020b The validation factors stated in the literature and compared with those reported in this study were all in full agreement with the threshold criteria reported in Table 6 which really implies and affirms that the proposed model is robust and fitted. In addition, the coefficient (R 2 ) values of 0.7659 and 0.6550 presented Figs. 2 and 3 for training and test sets also support the degree of correlation between the predicted anti-tubercular activities in this work and the reported experimental anti-tubercular activities in the literature. More also, the correlation coefficient (R 2 ) values also fall with the minimum threshold value reported in Table 6 for any accepted proposed QSAR model.
The residual plot shown in Fig. 4 suggests that this model can be used for the prediction of the antitubercular activity values for new compound since all the standardized residual values for training and test sets fall within the distinct boundary of ± 2 on the vertical axis, i.e., standard residual axis. Moreover, the low residual value computed ascertains no inaccuracy and no computational incompetency in the model prediction (Adeniji et al. 2020c;Adeniji et al. 2020d).
Applicability domain (AD) ensures that the proposed model is vividly used only to predict compounds similar in terms of distance measure, i.e., leverage h (Adeniji et al. 2020a;Adeniji et al. 2020b;Adeniji et al. 2020c;Roy et al. 2011;Adeniji et al. 2020d;Tropsha et al. 2003). AD for the proposed model using the Williams plots is presented in Fig. 5 which gives a graphical discovery of both influential compounds and outliers. The leverage measure is used to detect the outlier while warning leverage h* is used to detect the influential compound. In Fig. 5, all the compounds appeared to fall within the defined leverage measure of ± 3. Hence, no compound is said to be an outlier. However, compounds 38 and 40 are influential molecules since their computed leverage values exceed the warning leverage of h* = 0.54.

Molecular docking analysis
Analysis of docking interactions of the studied compounds with the protein target (DNA gyrase) is presented in Table 8. Interaction binding affinities between the protein binding pocket and the ligands range from − 4.09 to − 17.79 kcal/mol as shown in Table 8. Meanwhile, when the binding affinity of the conventional drugs, i.e., isoniazide (− 14.6 kcal mol −1 ), was compared with the binding affinities of the studied compounds, it was observed that compound 19 among the 1,2,4-triazole analogues has a binding affinity of − 17.79 kcal mol −1 greater than conventional drugs and other derivatives. Thus, ligand (compound 19) was visualized and evaluated using Discovery Studio Visualizer to ascertain its binding and interaction type. The 2dimension interaction of ligand 19 with the protein target site is shown in Fig. 6. Five conventional hydrogen bonds (2.29648, 2.28554, 2.43913, 2.99768, and 2.22618°A) were bonded with GLN101, TRP103, SER118, ASP122, and ASP122. Two hydrogen bonds were observed with the S=O of the ligand as an Hacceptor and linked with GLN101 and TRP103 of the protein active site while three hydrogen bonds were observed with the N-H group as an H-donor with SER118, ASP122, and ASP122 of the protein active site as reported in Fig. 8. Increase in the number of hydrogen bonding in ligand 19 compared to three conventional hydrogen bonds in isoniazide, i.e., 2.3001, 2.5301, and 2.2161°A, with ALA337, ALA337, and SER279 as presented in Fig. 8 accounts for the potency of ligand 19 over the commended drug.
Computational design of novel anti-tubercular agents Substitution, elimination, and addition techniques were employed to design some novel anti-tubercular agents with enhanced activities via modification of the template structure (compound 19) presented in Fig. 7 using the approach of ligand-based design (Adeniji et al. 2020a;Adeniji et al. 2020b). The template was selected as the reference compound and backbone to design new promising compounds due to its prominent activity values found within the applicability domain phase reported in Fig. 5. The discovery of the new compounds was successfully achieved based on the information derived for the computed mean effect on the descriptor: TDB3v and RDF70v with high influence on the biological activities of the studied compounds and substitution and deletion which was simply made on the position of acetylene and 1H-1,2,4-triazole moiety at positions 8 and 12 seen in Fig. 7. Based on the approach, twelve prominent compounds with improved anti-tubercular activities were successfully designed by substituting and eliminating the alkyl group, H atom, methoxy group, and 1H-1,2,4-triazole at positions 8 and 12 of the reference presented in Table 9. To ascertain and affirm the reliability of the designed compounds, leverage value was computed for all the designed compounds. Interestingly, all the computed leverage values for the designed compounds appeared to fall the warning leverage (h* = 0.64) in Fig. 5. Meanwhile, compound 19h was observed with high activity among the designed compounds. This was as a result of the alkyl group (CH 3 ) substituted at position 12 and 1H-1,2, 4-triazole substituted at position 8 of the reference template acting as electron donating substituents via positive inductive effect (+I), thereby increasing the electron density and making the pharmacophore of compound 19h more basic compared to other designed compounds.

Validation of designed compound 19h via molecular docking
Designed compound 19h was docked with the protein target (DNA gyrase) in order to confirm its potency with the binding site of the target. The target formed − 21.6 kcal/mol binding affinity with the ligand (compound 19h) as stated in Table 10 which appeared to be higher than the template (compound 19) binding affinity (− 17.79 kcal/mol) stated in Table 8. Ligand 19h formed seven conventional hydrogen bonds with targeted protein. The triazole "N-H group" acting as an H-bond donor provides contribution of five hydrogen bond linkages: two H-bonds with GLY120, one Hbond with PRO B:119, single H-bond with VAL278, and one H-bond with TRP103. More also, the triazole "S=O" acting as an H-bond acceptor provides meaningfully two H-bond bonds with TRP103 and SER104 as presented in Fig. 8. The increase in the number of hydrogen bonds in the receptor-ligand complex gives reasonable explanation why the binding affinity of designed compound 19h is higher than its reference template structure (compound 19) since more hydrogen bonds are observed in the designed compounds (Adeniji et al. 2020a;Adeniji et al. 2020b). Finally, the correlation between the QSAR studies and molecular docking is presented in Fig. 9. It is seen that the anti-tubercular activity of each molecule that made up the dataset coincides with the binding affinity with significant correlation of R 2 = 0.7206. Therefore, this signifies that there is relationship between the QSAR and molecular docking results at p < 0.05.

Conclusion
Combined in silico and theoretical approach was successfully applied to derive a proposed QSAR model capable of predicting the activities of 1,2,4-triazole and its analogue against M. tuberculosis. This model serves as a prominent tool for structural insight to design new hypothetical anti-tubercular compounds against multiple strain M. tuberculosis. Meanwhile, the reliability, significance, fitness, and robustness of the model have been fully established via internal and external assessments and validated molecular descriptors: MATS7s, SM1_ DzZ, TDB3v, and RDF70v that influence the antitubercular activities. Analysis of leverage measure also showed that the proposed model has a high predictability rate to predict all the anti-tubercular compounds that fall within its applicability domain space. In addition, docking studies showed that compound 19 has noticeable binding affinities from − 17.79 kcal/mol. Hence, it served as a structural template and insight to design twelve novel hypothetical agents with more competent activities. Meanwhile, compound 19h was observed with high activity among the designed compounds with a prominent binding affinity of − 21.6 kcal/mol. Therefore, in vivo, in vitro screening and pharmacokinetic properties should be carried out in order to determine the toxicity of the designed compounds.