Salicylic acid derivatives as potential α-glucosidase inhibitors: drug design, molecular docking and pharmacokinetic studies

Diabetes mellitus (DM) is one of the most defying health risk in the twenty-first century promoting a high rate of morbidity and mortality that could possibly increase if no intervention is in place. However, drugs for curing DM are available but are associated with adverse side effect necessitating the pursuit for a safe antidiabetic drugs. The present study was conducted in order to develop a QSAR model that would be used to predict the activities of salicylic acid derivatives, as well as to determine the binding interactions of the compounds with α-glucosidase using molecular docking studies. Model one was selected and reported as the best model based on its fitness with the following validation keys: R2(trng set) = 0.968, R2(adj) = 0.957, Q2(cv) = 0.932, LOF = 0.085 and R2(test set) = 0.864. Five potent analogues were designed using the ligand-based method with their predicted activities been calculated and found to be higher compared to the lead compound. Furthermore, binding interactions of the designed analogues within the active site of α-glucosidase (pdb id:3L4V) illustrate a good binding affinities than kotalanol and acarbose. However, the ADMET and drug-likeness properties predicted the design analogues to be pharmacologically and orally safe by not having more than one violation of Lipinski’s (Ro5) criteria. The present findings therefore showed that the salicylic acid derivatives could serve as α-glucosidase inhibitors. The compounds can be studied further for a hunts of promising drug candidates against diabetes mellitus.


Background
Diabetes mellitus (DM) is a metabolic disease defined by hyperglycemia as a result of a shortage in insulin action and/or secretion promoting disturbances in metabolic pathways of the body's nutrients. Globally, the disease is considered to be among the most serious threats to humans affecting approximately 173 million people worldwide leading to higher morbidity and mortality rates that could increase severely by 2030 if no intervention is in place (Wu et al. 2014). A high quantity of blood glucose level can render severe health complications such as neuropathy, retinopathy, microangiopathy and cardiovascular diseases (Chenafa et al. 2021).
The approach utilized for blood glucose control in DM relies on retarding the glucose absorption in the intestinal system by inhibiting α-glucosidase enzymes which is a key enzyme in catabolism of complex carbohydrates (Chaidam et al. 2021). Considering the importance of the enzyme in DM, a lot of scientific endeavors were put in place to finding promising agents to inhibit the enzyme activity coupled with the fact that the already available drugs are faced with drawbacks (Barber et al. 2021). A study conducted showed natural compounds from Salacia reticulata to possessed excellent efficacy against N-terminal human intestinal-maltase glucoamylase (ntMGAM), and therefore, inhibiting the enzyme by the compounds could help in alleviating the negative consequences of DM, since the investigated Ki values of the salacinol compounds against the enzyme were lower compared to acarbose suggesting their higher potencies than the standard drug (Sim et al. 2010). Also, another study conducted showed the possibility of salicylic acid derivatives as potential α-glucosidase inhibitors where a series of salicylic acid derivatives generated from phthalimide exhibited a promising antiα-glucosidase activities (Chen et al. 2019). These increase our quest towards searching of structural scaffolds from the salicylic acid derivatives against ntMGAM in complex with kotalanol (pdb id: 3L4V) by exploring quantitative structure-activity relationship (QSAR).
QSAR enables the identification of the structural and physicochemical properties of compounds by modulating their activities as well as helping and guiding the design and adoption of effective and reliable drug candidates (Ferreira et al. 2021). To this effect, we herein, aimed to develop a QSAR model from salicylic acid derivatives and use the model to predict the activities of the derivatives. Based on the derivatives, new structural analogues were designed and their activities, alongside molecular docking and their in silico ADMET and drug-likeness properties were evaluated.

Generation of datasets, structure and geometry calculations
A set of thirty-one (31) salicylic acid derivatives (Chen et al. 2019) with their inhibitory concentrations (IC 50 ) measured in mM against α-glucosidase inhibitors were selected for the study. The IC 50 of all the compounds were normalized to pIC 50 using Eq. 1 (Ibrahim et al. 2020).
Thereafter, Chemdraw version 12.0 was used to draw the 2-dimensional structures of all the compounds and the resultant structures were transformed automatically to 3-dimensional using the Spartan versions 14 software. Energy minimization and stability geometry of the molecules were carried out to reduce constraint in the structures as well as finding the most stable geometry of the studied molecules using density functional theory (DFT) utilizing Bee -3-Lee Yang Par method (B3LYP) and 6-311G* level of theory (Abdullahi et al. 2021). The optimized compounds were saved in a Spatial Document file format (sdf ).

Descriptors calculation, data pretreatment and dataset splitting
To gain more information on the compounds under study, the thermodynamic, electronic, autocorrelation and geometric descriptors of each compound were determined by importing the 3D structures in (sdf ) file format into the Pharmaceutical Data Exploration Laboratory (PaDEL) descriptor tool kit and the calculation was performed (Lawal et al. 2020). Data pretreatment software version 1.2 was used to pretreat the calculated descriptors of all the compounds under study followed by manual pretreated to eliminate unwanted and persistent descriptors. Following pretreatment, data splitting software was used to split the data into training (modelling) set and test (validation) set by the application of the Kennard-Stone algorithm (Idris et al. 2021). In this study, twenty-one (21) compounds were used as training sets and ten (10) compounds were used as test sets.

Models building, validation and assessments of the chosen model
Models were developed using material studio version 8.0 software and also applying the genetic function approximation (GFA) which takes into consideration pIC 50 as dependent variables and molecular descriptors as the independent variables. The models were scored based on Friedman's lack of fit (Abdullahi et al. 2021). Furthermore, the equation length including both (initial and maximum equation length) was set at 5, mutation probability of 0.1, population and maximum generation parameters were set at 1000 each, and the number of top equations returned was set at 4, respectively. The square correlation coefficient of the training set, adjusted (R 2 adj ), lack of fit (LOF) and cross-validation coefficient (Q 2 cv ) were determined from the generated models.
External validation was conducted on the developed QSAR models using the test sets compounds in which the R 2 test value was calculated using Eq.
(2) and the best model was chosen. For an external regression equation, the value of R 2 should be closed to one (Abdullahi et al. 2021).
where Y exp , Y pred and Y train are experimental, predicted and average training set activities of the model compounds.
The selected model was assessed by the following numerical measures: mean effect and Y-scrambling test. The mean effect was used to estimate the contribution of each descriptor to the chosen model using Eq. (3).
(2) However, the signicance of these descriptors coupled with their mean effect values indicates their influence on the activities of the compounds .
where ME j is the mean effect of descriptor j in a model, B j is the coefficient of the descriptor j in that model and d ij is the value of the descriptor in the data matrix for each molecule in the model building set, m is the number of descriptors that appear in the model and n is the number of molecules in the model building set (Ibrahim et al. 2020). Furthermore, the Y-scrambling test was used to justify the robustness of the chosen model. The test was performed by evaluating the coefficient of validation parameter for Y-randomization cR 2 p which is done by rearranging the actual activities and keeping the descriptors unchanged. It was expected that the new QSAR model will have low Q 2 and R 2 values and cR 2 p must be greater than 0.5 (Idris et al. 2021).

Applicability domain
The applicability domain was assessed using William's plot (i.e., plot of standardized residuals versus the leverage values) in other to determine whether the chosen model has a prevailing or peripheral molecules in the specific datasets ). The assessment was carried out by evaluating the leverage approach as well as the warning leverage using Eqs. (4) and (5): where h i is the leverage approach, X is the n × k descriptor matrix of the training sets, X T is the transpose matrix used in generating the model, h * is the warning leverage, x is the number of descriptors of the selected model and q is the number of compounds in the training sets ).

Ligand-based analogues design
The selection criteria for the choice of a compound to be considered as a lead compound for analogues design were focused solely on the information obtained from the model (i.e., compound with low residual value, reasonable/high pIC 50 , found to be within the preferred domain (AD) and do not violates Lipinski's rule of five. However, compound 6 from the salicylic acid derivatives was selected (lead compound) and was modified by insertion and substitution of different groups on XYZ positions (template compound) based on their notable mean effect value.

Molecular docking
A HP Z-Book computer system having the following specifications: An Intel ® CoreTM i7-4800MQ vPro Dual CPU @2.70 GHz 2.70 GHz, 12 GB of RAM and also utilizing Molegro Virtual Docker (MVD) 2013 version 6.0.1 was used to investigate the binding interactions between the active site of α-glucosidase and the design analogues. Before docking analysis, analogues were prepared and optimized using Spartan versions 14 software and saved in sdf file format (Ibrahim et al. 2020). α-Glucosidase with (pdb id: 3L4V) in complex with kotalanol as the cocrystallized ligand was obtained from protein data bank database (https:// www. rcsb. org/) and MVD was used to prepare the enzyme before proceeding to dock study. During the preparation, water molecules and co-crystal ligands were removed from the crystal structure, the surface was generated and a maximum number of cavities was identified and fixed to 5 for detection of possible binding cavities; execution of docking study was carried out by importing the prepared analogues into the MVD software. MolDock score grid as the scoring function, binding site radius of 15A°, grid resolution of 0.30A° were selected while keeping other parameters as default (Abdullahi et al. 2021). Following a successful docking technique, the MolDock score and hydrogen bond energy were computed. The docked complexes were stored in a PDB file format, and discovery studio version 16.1.0 software was utilized to further visualize the interactions of the docked complexes.

ADMET and Drug-likeness of the design analogues
It is viable to assess the absorption, distribution, metabolism, excretion, toxicity and drug-likeness of analogues via silico assays. The drug-likeness and ADMET properties of the design analogues were carried out using SWIS-SADME (http:// www. swiss adme. che/ index. php) and pkCSM (http:// struc ture. bioc. cam. ac. uk/ pkcsm) which happens to be a certified free web tool for predicting and evaluating the ADMET and drug-likeness of small compounds (Kar and Chatterjee 2020). However, one of the main aspects in the preclinical phase of analogues design is to know whether if those analogues are orally bioavailable, in this regard the Lipinski's rule of five criteria was used to access the analogues under study which states that a chemical is weakly absorbed if it disobeys more than two of these conditions (molecular weight ˂ 500, number of hydrogen bond donors ˂ 5, number of hydrogen bond acceptors ˂ 10 and calculated Log p ˂ 5) (Chen et al. 2020) which were applied to the design analogues to predict their drug-likeness and ADMET properties.

QSAR analysis
The experimental activities of the datasets were normalized to pIC 50. Also, the predicted pIC 50 (s) of the compounds was determined (Table 1). The differences between the experimental and predicted activities of the compounds (residual values) were further calculated (Table 1). Subsequently, models were generated using the genetic function approximation. Four models were generated and shown thus; Following models building, validation was conducted ( Table 2). The R 2 trng set , R 2 adj , Q 2 cv , as the internal validation parameters of the models were greater than 0.920 (closer to 1) while only model one passed the external validation with R 2 test set of 0.864 (Table 2). Furthermore, an XY (scatter) plot of residuals against experimental pIC 50 from both the training and test sets compounds is shown in Fig. 1. The anomalous occurrence of these residuals on both side of zero on the plot indicates the absence of feasible mistakes in the model building (Fig. 1).
Based on the initial validations, model one was selected as the best model due to the fact that it has the highest R 2 test set of 0.864 and was subjected for further assessment. From the ME analysis, positive ME values were observed in ATSC2e, MATS1e, GATS2c and Crippen-LogP with CrippenLogP descriptor having the highest ME value. Conversely, negative ME value of -1.01 with SpMin6-Bhm descriptors was noticeable (Table 3).
Moreover, assessment on the robustness and obtainability of the model whether by chance correlation or not was conducted using the Y-randomization test (Table 4). From the result obtained, the R 2 and Q 2 values were 0.19 and − 0.57, respectively, while the cR 2 p of the model was 0.79 (Table 4). Williams plot was used in order to find compounds that undesirably stimulate the performance of the model within the domain of the model (Fig. 2). From the plot, one compound from the test set was found to be outside the favored domain, i.e., warning leverage (h* ˃ 0.86) (Fig. 2). The compound was identified as compound 28.
After establishing the applicability domain, ligands were designed using the ligand-based approach coupled with information obtained from the model, compound 6 was tag as the lead compound (Fig. 3A), and template ( Fig. 3B) was adopted. The design was performed by addition of different groups on the X, Y and Z position on the template with model one utilized in predicting their activities.
Among the designed analogues, five had predicted pIC 50 relatively higher than the lead compound (Table 5). Interestingly, the analogues showed values between 4.32 and 5.11 with analogue A4 having the highest value of 5.11 (Table 5).
Molecular docking studies conducted between the designed analogues and α-glucosidase (pdb id:3L4V) showed an interesting finding. The graphical structure of the protein was shown, and the active site of the enzyme was encircled (Fig. 4).
The MolDock scores of the analogues were in the order Analogue A2 < Analogue A3 < Analogue A1 < Analogue A5 < Analogue A4 (having a value of − 196.1). However, it is important to note that the co-crystallized ligand (kotalanol) and acarbose have MolDock scores of − 148.4 and − 146.6, respectively ( Table 6).
Visualization of the docking results was performed using discovery studio visualizer. Notably, Analogue A1 interacted with Ser288, Ile523, Lys776, Phe535 and Leu286 residues of the protein through conventional hydrogen bond interactions, respectively (Table 6, Fig. 5A and a). Also, Analogue A2 was bound to the protein via     Fig. 5B and b). Two conventional H-bond interaction was observed in Analogue A3 with Arg283 and Leu286, respectively (    Fig. 5F and f, G and g). Noticeably, other interactions such as carbon H-bond, alkyl, pi-alkyl, pi-sulfur, pi-cation, pi-sigma and pi-donor H-bond were observed (Table 6). The ADMET properties of the designed analogues (Table 7) showed an excellent intestinal absorption ranging between 91.087 and 92.961%, The BBB permeability and central nervous system permeability were also found to be within the recommended value which is log BB < − 1 and log PS < − 3 , respectively. The cytochrome P450 (CYP) superfamily which are responsible for so many drug interactions were examined with the designed analogues serving as substrate of 3A4 and inhibitors of 2C19, 2C9 and 3A4, respectively. Moreover, the designed analogues are found to be nontoxic with good total clearance potential within the range of − 0.367 to 0.007 (Table 7).
Subsequently, the drug-likeness properties depicted in (Table 8) of the design analogues showed that only analogue A1 was found to respect Lipinski's (Ro5) without violating any of the criteria set by the rule while analogues A2, A3, A4 and A5 were found to have one violation (i.e., their molecular weight exceeds 500). The bioavailability score was also observed to be 0.55 except for analogue A4 which is 0.17. The synthetic accessibility having a recommended range from 1(very easy to synthesize) to 10(very hard to synthesize) was also analyzed with the designed analogues found to be within the range of 3.14 to 3.43 (Table 8).

Discussions
Globally, the increased occurrence of DM is disturbing (Wu et al. 2014). The adverse side effects of the synthetic antidiabetic agents hassle the search for new therapeutic candidates (Banerjee et al. 2020).
In this study, the fascinating in vitro activities of salicylic acid derivatives against α-glucosidase (Chen et al. 2019) led to the structural activity relationship analysis of the derivatives. Most importantly, designing potent analogues from the derivatives could serve as α-glucosidase inhibitors. The QSAR models were generated using genetic function approximation. This is because the GFA delivers more than one model which gives a degree of freedom to select the best (Abdullahi et al. 2021). Subsequently, the generated models were validated in order to ascertain their fitness and predictive power to improve the compound's biological activities . Model one was selected as the best based on the R 2 test set value of 0.864 which shows that the model was not over-fitted and sensible predictive activities for designed analogues is promising compared to other models.
The Pearson's correlation shows the correlation of the descriptors in the model, and the mean effect values ME of the molecular descriptors represent the physicochemical properties that provides a structural information of each descriptor but in a numerical form, i.e., a unique information of each descriptor that can be employed to improve the activities of the compounds . However, positive coefficient descriptors ATSC2e, MATS1e, GATS2c and CrippenLogP as seen from the result indicate a positive influence of these descriptors to the activities of salicylic acid derivatives, i.e., the greater the value of these types of descriptors the greater the antidiabetic activity and vice versa. Consequently, the descriptor with a negative coefficient SpMin6-Bhm indicates a negative influence to the activities of salicylic acid derivatives, i.e., the lower the value of this type of descriptor, the greater the antidiabetic activities. Thus, this shows the role of steric and electrostatic interactions in inducing the activities of salicylic acid derivatives against α-glucosidase . Also, a cR 2 p of 0.79 obtained indicates the robustness of model one and obtainability was not based on probability. This is because the recommended value for the parameter is greater than 0.5 (Idris et al. 2021).
Moreover, the salicylic acid derivatives were exposed to Williams plot in order to inspect for influential compounds that might undesirably affect the model performance (Ibrahim et al. 2020). Among the derivatives, compound 28 was considered as an outlier. The change in substituent and position of attachment could be the cause for the compound to be found outside the preferred domain.
Among the derivatives within the defined domain, compound 6 with low residual value, good pIC 50 value was tag as lead compound and positions X, Y and Z were added as point of attachments as indicated in the adopted template. The choice of the substituent to be added was based on ATSC2e and CrippenLogP descriptors as it has earlier reported to have highest ME values. Based on the descriptors, five designed analogues were found to   have a higher activities relative to the lead compound. Electron-withdrawing group particularly bromine group have shown to significantly enhance the activities of the designed analogues. However, similar substituents were shown to be used in recent studies conducted which were found to be promising in enhancing the activities of compounds (Abdullahi et al. 2021). The docking studies of the five designed analogues against α-glucosidase (pdb id:3L4V) were conducted. MolDock score is the scoring function which gives information on the binding energy of interaction. It was adopted for evaluating the interactions of the designed analogues against the active site of the α-glucosidase. The result of the docking studies show that they were bind at the active site of the receptor with a good MolDock score compared to kotalanol and acarbose. Observably, the higher number of H-bond energies in acarbose and kotalanol was as a result of higher number of hydroxyl functional group which portrays a greater number of hydrogen bond interactions at the active site of the receptor after docking. Most of the interactions occur with the amino acid residues found within the active sites (Maurya et al. 2020).
The ADMET results showed that the analogues had passed the adsorption, distribution, metabolism, excretion and toxicity parameters with good bioavailability score and orally safe. Most at times determines the success of a drug candidate (Zafar et al. 2020).

Conclusions
The present findings therefore showed that model one generated from genetic function approximation was the best due to its fitness with the following internal and external validation parameters of R 2 (trng set) = 0.968, R 2 (adj) = 0.957, Q 2 cv = 0.932, LOF = 0.085 and R 2 (test set) = 0.864. However, the five potent analogues designed using the ligand-based were found to have higher activities than the lead compound. The compounds interacted reasonably well with the active site of the α-glucosidase. Based on the present findings, the compounds could be used subsequently for the search of new antidiabetic agent.