Theoretical investigation and design of some indole derivatives as potent β-glucuronidase inhibitors

β-glucuronidase enzyme is mostly found in plants and animals. It plays a vital role in detoxification of reactive metabolites that are interrelated to several illnesses and the growth of colon cancer. It speeds up the breaking down of β-glucuronosyl-O-bonds. Lack of β-glucuronidase enzyme leads to Sly syndrome in humans, and overexpression of this enzyme leads to many diseases. Therefore, it becomes necessary to mediate the effect of this enzyme. Theoretical investigation via QSAR modeling on 30 indole derivatives was performed to build a model which could be used to predict the activity of the indole derivatives. QSAR was carried out using multi-linear regression (MLR) method utilizing genetic function approximation (GFA) to develop the QSAR models. A very high predictive QSAR model was reported based on its statistical fitness with good internal and external validation parameters: R2trng = 0.954942, Qcv2 = 0.925462, R2test = 0.855393, and LOF = 0.042924. Molecular docking on the 30 indole derivatives was also performed to screen and identify the lead compound that would be used as template for designing new indole compounds. The docking investigation reveals that ligand 10 binds very tight in the binding pocket of β-glucuronidase enzyme with binding energy of − 9.5 kcal/mol. The ligand (10) was chosen as a template for designing new β-glucuronidase inhibitors. The four design compounds were found to be better than the template and the standard drug (D-saccharic acid 1, 4-lactone) with binding energies of − 9.6, − 9.7, − 9.8, and − 9.9 kcal/mol. A very high predictive QSAR model with good internal and external validation parameters: R2trng = 0.954942, Qcv2 = 0.925462, R2test = 0.855393, and LOF = 0.042924, was built and reported in this study. Molecular docking investigation reveals that the most potent compound among all the data set was compound 10 with binding energy of − 9.5 kcal/mole. It bound to the binding pocket of β-glucuronidase enzyme via hydrophobic, electrostatic, and hydrogen bond, and it was retained as template for designing new indole compounds. The design compound with serial number ID 4 was identified to have the highest binding energy of − 9.9 kcal/mole among the designed compounds. It bound to the binding site of the β-glucuronidase enzyme via halogen, hydrophobic, electrostatic, and hydrogen bond. The design compounds were discovered to be better than the template used in the design and the standard drug.


Background
Indole is a heterocyclic aromatic organic compound consisting of a six-membered benzene ring fused to a five-membered pyrrole ring (Bouchikhi et al. 2008) with numerous physiological activities which include antiinflammatory, anticancer, antiulcer, antimalarial, antioxidant, antibiotic, antimicrobial, and anticonvulsant (Patil and Dandagvhal 2016). Indole is broadly disseminated in the real environment and can be generated by different species of bacteria (Lee and Lee 2010). The synthesis of indole has been performed utilizing different starting materials in the literature of organic chemistry (Chadha and Silakari 2017).
β-Glucuronidase enzyme is mostly found in plant in the Escherichia, anaerobic bacteroides, and Clostridia. β-Glucuronidase enzyme is also found in human fluids, serum, blood cells, gastric juice, bile, urine, spleen, and it can also be found in organs like the kidney, liver, lung, and muscles (Ali et al. 2016;Gloux et al. 2011). It plays a vital role in detoxification of reactive metabolites that are interrelated to several illnesses and the growth of colon cancer (De Moreno de LeBlanc and Perdigón 2005). β-Glucuronidase enzyme speeds up the breaking down of βglucuronosyl-O-bonds. Lack of β-glucuronidase enzyme leads to Sly syndrome in animals (humans) that is related to the high level of glycosaminoglycans in cells, and overexpression of this enzyme leads to many diseases which include urinary tract infections, active pyelonephritis, acute renal necrosis, and some pathological conditions including epilepsy, cancer, AIDS, renal diseases, neoplasm of the breast, larynx, bladder, and testes, and inflammation in joints and hepatic. Therefore, it becomes necessary to mediate the effect of this enzyme (Salar et al. 2015;Sharma et al. 2014;Taha et al. 2015). Ibrahim et al. performed quantitative structure-activity relationship (QSAR) modeling, molecular docking, and pharmacokinetic study on 30 N-Arylidenequinoline-3carbohydrazides analogs utilizing multi-linear regression analysis adopting genetic function algorithm method. Semi empirical method using PM6 basis set was used for complete geometry optimization of the data set. The molecular docking study showed that the most active chemical in the data set was better than the standard βglucuronidase inhibitor both in terms of binding scores and the amino acid residues that interacted with the drug and β-glucuronidase enzyme. The pharmacokinetic studies indicated that none of the chemicals violated any of the condition set by the Lipinski's Rule of five which confirm the bioavailability of these chemicals (Ibrahim et al. 2020a).
Theoretical modeling is a unique area of interest in computer-aided drug design which provides computational software and tools that are utilized in the design and development of novel compounds of medicinal benefit (Jorgensen 2004). Quantitative structure-activity relationship (QSAR) is a computational modeling technique used to study the connection between the experimental activities with different physicochemical properties associated with the structures of a particular molecule (Ibrahim et al. 2020b;Ojha Lokendra et al. 2013). Molecular docking is also a computational modeling technique used to predict the binding energy of intermolecular complexes based on their 3D structures when one bind to the other (Abdullahi et al. 2020;Kitchen et al. 2004). This study is aimed at carrying out QSAR and docking simulation on indole derivatives against β-glucuronidase enzyme and design new β-glucuronidase inhibitors using structure-based drug design.

QSAR modeling methodology
Data retrieving, activity normalization, and structure drawing Thirty set of indole derivatives and their corresponding β-glucuronidase inhibitory activities (IC 50 ) were retrieved from the work of (Baharudin et al. 2017) for the purpose of this study. After the data were collected from the literature, the inhibitory activities (IC 50 ) of these compounds were transformed to their corresponding negative logarithm (pIC 50 ) using Eq. 1. The Chemdraw software Ultra version 12.0 was employed for drawing of the structures of all the data set prior to optimization as shown in Fig. 1 (Ibrahim et al. 2019). Figure 1 shows the 2D general structure of the studied compounds.
Searching for optimum geometry and descriptor computations In order to search the structure of all the data set at global minima on potential energy surface (PES), semiempirical method using PM6 basis set was used for the searching of the most optimum geometry of all the data set on the PES (Amin and Gayen 2016). For the computation of physicochemical parameters (descriptors), Pharmaceutical Data Exploration Laboratory (PaDEL) descriptor tool kit was used to generate all the descriptors used in this research (Yap 2011).

Dataset pretreatment and division
Before dataset division, the data were pre-treated using data pre-treatment software retrieved from drug theoretic and cheminformatics (DTC) laboratory to remove redundant values (Ambure et al. 2015). Data division software was further employed to separate the data into model building set (70%) and validation set (30%) based on the Kennard-Stone algorithm also gotten from DTC laboratory (Kennard and Stone 1969).

Model generation and validation
The model building set was subjected for further analysis (regression analysis) adopting genetic function algorithm (GFA) method in material studio software to generate the models. The equation for the regression analysis is shown in Eq. (2).
where Y is the pIC 50 , "B's" are coefficients for the corresponding "Z's" which are the descriptors, and the last variable "C" is the regression constant (Ibrahim et al. 2020c).
After model development, the model must be subjected to validation assessments, in order to ascertain the model's quality, reliability, and predictive ability. The most widely used methods for the assessments of QSAR models were the internal validations, external validations, and applicability domain. The square correlation coefficients (R 2 ) of both the training and test sets are used alongside with (Tropsha 2015) cross-validation coefficient (Q cv 2 ) as the assessment parameters. Crossvalidation is the statistical theory of separating of data into subsets such that the analysis is first performed on a single subset, while the other subset(s) are retained for subsequent use to confirm and validate the first analysis (Jalali-Heravi and Kyani 2004) and are defined as: where Y prd and Y obs are the predicted and observed activity (pIC 50 ) respectively of the validation set, and Y mntrng is the mean activity value of the model building set.
The multi-collinearity between the descriptors can be detected using their variation inflation factors (VIF), which can be calculated as follows: where R 2 is the correlation coefficient of the model. If VIF is equal to 1, it means there is no inter-correlation between the descriptors; if it is with the range of 1-5, the model is acceptable; and if greater than 10, the model is bad and therefore rejected and must be rechecked (Beheshti et al. 2016).
In order to confirm the predictability, strength, reliability, and accuracy of the QSAR model developed and also identify influential and outlier compounds, the mechanistic space of the model must be exploited (applicability domain) (Tropsha et al. 2003). In exploiting the applicability domain of the studied model, leverage approach was employed and is given as h i : where y i is the model building compound matrix I, Y is n × k descriptor matrix of the model building set compounds, and Y T is the transpose matrix Y used to develop the model. As a prediction tool, the thresh-hold value (h*) which is the limit for Y values and is given as: where q is the number of descriptors in the model, and z is the number of model building compounds. For any QSAR model to be considered as valid and used, it must pass both the internal and external validations as recommended by Veerasamy et al. (Veerasamy et al. 2011) (see Table 1).

Docking methodology
Docking studies was conducted on personal Dell computer system, with Intel ® Core™ i7, 8GB of RAM using the Auto dock vina of Pyrex virtual screening software, UCSF Chimera version 1.10.2, and Discovery Studio Visualizer software. Preparation of ligands and the enzyme β-glucuronidase Before docking commences, the optimum geometry optimized using Spartan'14 was saved as protein data bank file format (ligand preparations) (Abdulfatai et al. 2017). Figure 2a presents the 3D structure of the prepared of the most potent ligand. 3D crystal structure of human β-glucuronidase was retrieved from protein data bank (with PDB ID 1bhg). Discovery Studio Visualizer was used to prepare the human β-glucuronidase enzyme for the docking studies; in the course of the preparation, chain B, heteroatoms, and coligands were deleted from the dimer structure and saved as protein data bank file format (Veerasamy et al. 2011). Figure 2b shows the prepared structure of the human βglucuronidase.

Docking simulation
Vina of Pyrex virtual screening software was used to dock the ligands with human β-glucuronidase (Trott and Olson 2010). Vina of Pyrex virtual screening software has a short-coming of separating the receptor and the ligand after performing the docking. As a result of that, the UCSF Chimera 1.10.2 software was utilized to rebuild the complexes (ligand-receptor) for further investigation. Discovery Studio Visualizer was used to visualize the ligand-receptor to show the interactions.

Design
Structure-based drug design also called direct drug design was used to design new β-glucuronidase inhibitors. Co-efficient of determination of external and test set ≥ 0.50 Fig. 2 The a 3D structure of the prepared ligand and b 3D structure of the prepared β-glucuronidase  Ligand 10 was chosen as the template as it has the highest binding energy of − 9.5 kcal/mol. Structural modification was carried out on the C-4 of the indole moiety by adding halo-substituted phenyl ring. Table 2 shows the structures of the design compounds

Results of docking studies
The results of the docking studies are shown below in Table 6 and Fig. 5.

Results of the design compounds
The result of the newly designed β-glucuronidase inhibitors is given in Table 7 and Fig. 6 a and b.

QSAR modeling results
Three QSAR models were built; out of these built models, the best reported model was based on its statistical fitness with R 2 value of 0.954942, R 2 adj value of 0.93885, Q cv 2 value of 0.925462, and the R 2 pred value of 0.855393 and also lowest Friedman's Lack of fit value of 0.042924. On taking the comparison of the statistical parameters of the reported model with those reported by Veerasamy et al. (Veerasamy et al. 2011) shown in Table  1, it can be seen clearly that the statistical parameters of the reported model were all greater than the minimum recommended values which confirmed the reliability of the model. Table 3 presents the symbols, descriptions, and the classes of descriptors that appear in the reported model. The physicochemical parameters responsible for the inhibitory activities of the β-glucuronidase inhibitors were identified to be AATS4i, ATSC2m, ATSC8v, SpMAD_Dzs and SpMax_Dt. Among these descriptors as seen in the model, SpMAD_Dzs has the highest coefficient of + 0.949838372 which gave the highest contribution in the model followed by ATSC8v. ATSC2m gave the lowest contribution in the model as it has the lowest coefficient of − 0.007819193. Either increase or decrease in these physicochemical parameters, the inhibitory activities of these compounds may increase or decrease.
Predicted activities of both the model building set and that of the validation set were plotted against the actual activities (Fig. 3a) in order to affirm the strength of the reported model. The tool used in this case is R 2 value of both the plot and that of the internal validation. The strength of the reported model was confirmed by the corroboration of R 2 value of the plot (0.9963) and that of the internal validation (0.954942). Every good QSAR model is expected to be free from systematic error. In order to determine whether the selected model is free from systematic error, predicted activity was plotted against standardized residuals (Fig.  3b). The random propagation of the standardized residuals on either side of zero on the plot confirmed that the reported model is free from systematic error.
The observed activities were seen to have good correlations with the predicted activities. Table 4 represents the structures, IC 50 , pIC 50 , predicted pIC 50 , and residual values of the data set. The low residual values observed in the table further confirmed the stability and reliability of the reported model. The correlation analysis of descriptors in the model building set of the reported model in Table 5 shows the importance of descriptors in the model. From the table, no correlation exists between the descriptors that appear in the reported model. The computed VIF values for all descriptors were obtained to be less than 5 (Table 5) indicating that the reported model was statistically significant and the descriptors were orthogonal (no multicollinearity between the descriptors).
To confirm the quality of the reported model, it is subjected to applicability domain as shown in Fig. 4 (the plot of leverage value calculated for all the data set vs. standardized residuals or Williams plot), which permit a graphical identification of both the outliers and influential compounds in the chosen model (Beheshti et al. 2016). It can be seen from the plot that all the compounds of the model building set and the validation set were in the square area (mechanistic space). Also, there were no outliers or influential compounds with standard residuals > 3d for both model building set and the validation set. This confirmed the quality of the reported model which means the model can make good prediction of activities of new compounds.

Results of docking analysis
Docking analysis on the four highly potent indole derivatives and the standard drug was performed to investigate the nature of interactions between them and their target receptor (β-glucuronidase). Table 6 gives the results of the docking analysis. It was seen from the docking results that ligand 10 being the most active compound has the highest binding free energy of − 9.5 kcal/mol. The most active ligand bound to the active site of βglucuronidase enzyme by hydrophobic, electrostatic, and hydrogen bond. Furthermore, it bound via hydrophobic with TYR29, LEU300, PRO323, MET393, CYS396, ARG436, and LEU322 amino acid residues and bound through electrostatic with HIS440 amino acid residue. More so, it bound via hydrogen bond with TYR29 (1.93247 Å), HIS440 (2.3299 Å), and GLU390 (3.68037 Å) amino acid residues. Looking at ligand 2 having the docking scores of − 8.8 kcal/mol, it forms hydrophobic interactions with TYR29, PRO148, LEU322, PRO323, MET393, ARG436, and ARG436; electrostatic interactions with CYS396; and hydrogen bond interactions with (3.55365 Å) amino acid residues. The standard drug with docking score of − 5.7 kcal/mol formed only hydrogen bond with ASN502 (2.42729 Å) and GLN524 (3.44886 Å) amino acid residues of β-glucuronidase enzyme. On comparing the most active compound with the standard drug, it can be seen that the most active compound has higher docking score than the standard drug which may be attributed to the kinds of interaction that occur between the active compound and the binding pocket of βglucuronidase. Figure 5 shows the 2D structure of the visualized ligand-receptor 10.

Result of the design
Four new β-glucuronidase inhibitors were designed and found to bind tight in the binding pose of βglucuronidase enzyme ( Table 7). Addition of halosubstituted phenyl ring increased the binding energy of the designed compounds. The designed compounds were found to have higher binding energy than the template (− 9.5 kcal/mole) and the standard drug (5.7 kcal/ mole). A design compound with serial number ID 4 was identified to have the highest binding energy among the designed compounds. It bound to the binding site of the β-glucuronidase enzyme via halogen bond with TYR505;

Conclusion
A very high predictive QSAR model with good internal and external validation parameters: R 2 trng = 0.954942, Q cv 2 = 0.925462, R 2 test = 0.855393, and LOF = 0.042924 was built and reported in this study. The model was further subjected to applicability domain to confirm the quality of the model in terms of its predictions and found to make good prediction. Docking investigation reveals that the most potent compound among all the data set bound to the binding pocket of β-glucuronidase enzyme via hydrophobic, electrostatic, and hydrogen bond. The design compound with serial number ID 4 was identified to have the highest binding energy of − 9.9 kcal/mole among the designed compounds. It bound in the binding site of the β-glucuronidase enzyme via halogen, hydrophobic, electrostatic, and hydrogen bond. The design compound was discovered to be better than the template used in the designing new compound and the standard drug. The aim of this research as stated was achieved.