QSAR, molecular docking studies, ligand-based design and pharmacokinetic analysis on Maternal Embryonic Leucine Zipper Kinase (MELK) inhibitors as potential anti-triple-negative breast cancer (MDA-MB-231 cell line) drug compounds

Cancer of the breast is known to be among the top spreading diseases on the globe. Triple-negative breast cancer is painstaking the most destructive type of mammary tumor because it spreads faster to other parts of the body, with high chances of early relapse and mortality. This research would aim at utilizing computational methods like quantitative structure–activity relationship (QSAR), performing molecular docking studies and again to further design new effective molecules using the QSAR model parameters and to analyze the pharmacokinetics “drug-likeliness” properties of the new compounds before they could proceed to pre-clinical trials. The QSAR model of the derivatives was highly robust as it also conforms to the least minimum requirement for QSAR model from the statistical assessments of (R2) = 0.6715, (R2adj) = 0.61920, (Q2) = 0.5460 and (R2pred) of 0.5304, and the model parameters (AATS6i and VR1_Dze) were used in designing new derivative compounds with higher potency. The molecular docking studies between the derivative compounds and Maternal Embryonic Leucine Zipper Kinase (MELK) protein target revealed that ligand 2, 9 and 17 had the highest binding affinities of − 9.3, − 9.3 and − 8.9 kcal/mol which was found to be higher than the standard drug adriamycin with − 7.8 kcal/mol. The pharmacokinetics analysis carried out on the newly designed compounds revealed that all the compounds passed the drug-likeness test and also the Lipinski rule of five. The results obtained from the QSAR mathematical model of parthenolide derivatives were used in designing new derivatives compounds that were more effective and potent. The molecular docking result of parthenolide derivatives showed that compounds 2, 9 and 17 had higher docking scores than the standard drug adriamycin. The compounds would serve as the most promising inhibitors (MELK). Furthermore, the pharmacokinetics analysis carried out on the newly designed compounds revealed that all the compounds passed the drug-likeness test (ADME and other physicochemical properties) and they also adhered to the Lipinski rule of five. This gives a great breakthrough in medicine in finding the cure to triple-negative breast cancer (MBA-MD-231 cell line).


Background
Cancer has long been a challenging malady to the human race with great occurrence and death rates. Nearly 8 million humans pass away from tumors yearly, and 14.2 million different cancer patients are treated worldwide yearly . Cancer of the breast is the most reoccurring type of disease and the second leading cause of death among women globally. Yearly, approximately 1-1.3 million cases of breast cancer are detected globally (Ge et al. 2019). The high occurrence and death rate of cancer occur from the fact that there are over 200 kinds of cancer and it is indeed hard to diagnose at a premature stage (Lolak et al. 2019).
Triple-negative breast cancer (TNBC) is painstaking the most destructive type of mammary tumor because it spreads faster to other parts of the body, with high chances of early relapse and mortality (Hu et al. 2012). Yearly, an estimation of over a million female beings is detected with mammary tumor and TNBC is responsible for almost 15-20% of the overall breast cancer detected (Jo et al. 2019). TNBC does not express estrogen receptor (ER), progesterone receptor and human epidermal growth factor 2 (HER2) (Hu et al. 2012).
Early detection, understanding of the cause and pathway of this disease and advancing in treatment have played a key role in decreasing breast cancer death rates during the past few years. Chemotherapy remains the main key to complete therapy since it would extend and terminate tumor cells faster within the human system (Kaplan 2013 Table 4 Page 3 of 20 Lawal et al. Bull Natl Res Cent (2021) 45:90 have proven to be faster than traditional methods of drug discovery, and it saves more time, resources and also tends to have higher efficacy with less toxicity. Efficiency and safety of the drug to the system are the two major causes leading to drug failure. Therefore, it is compulsory to find potent molecules with better ADMET properties "drug-likeliness" (Guan et al. 2019). Novel 61 series of parthenolide derivatives were obtained from the literature of (Ge et al. 2019) as inhibitors against the MDA-MB-231 cell line. Edupuganti et al. 2017 suggest that MELK with Protein Data Bank (PDB: 4BKY) has therapeutic value for targeting various cancers, especially the critical survival functions like TNBCs.
The purpose of this research is to utilize computational methods like quantitative structure-activity relationship (QSAR), in calculating the activities of parthenolide derivative compounds as inhibitors against breast cancer cell line MDA-MB-231 based on an established QSAR mathematical model, molecular docking studies in understanding how the ligands (parthenolide compounds) interrelate with a protein receptor Maternal Embryonic Leucine Zipper Kinase (MELK), again, to further design new effective molecules using the QSAR model parameters and to analyze the pharmacokinetics "drug-likeliness" properties of the new compounds.

Data set
In total, 61 derivative compounds of parthenolide were obtained from (Ge et al. 2019) article.

Anti-proliferative activities
Their bioactivities were changed to (pIC 50 ) using the formula below. It was measured in inhibitory concentration (IC 50 ) values in the micromolar concentration of (µM) (Abdulrahman et al. 2020a, b, c). The parent compounds as shown in Fig. 1a-d are the structural derivatives of parthenolide. Tables 1, 2, 3, 4 and 5 show the various substituents measured in (IC 50 ) and (pIC 50 ) that were attached to the parent compound.

Molecular descriptor calculations
In total, 61 derivative compounds of parthenolide were converted to SDF format after optimization. Pharmaceutical Data Exploration Laboratory Software V (2.20) was used in calculating physicochemical descriptors (Yap et al. 2011). The descriptors were pretreated using Data Pre-treatment software GUI 1.2. Abdulrahman et al. 2020a, b, c) to remove irrelevant values.

Training set and test set division
For a valid QSAR model to be obtained, the pretreated set must be divided into train and test set. The division should satisfy the following conditions: (i) it is essential for all the train set descriptive points to be evenly shared within the entire area used by the data set (Noolvi and Patel 2013). (ii) All descriptor values should be near the train set descriptor values. (iii) All characteristic molecular elements in the test set should spread within the pIC 50 = − log 10 IC 50 × 10 −6 . Page 4 of 20 Lawal et al. Bull Natl Res Cent (2021) 45:90 descriptor space and should also be close to the train set (Noolvi and Patel 2013). The algorithm of Kennard-Stone was employed in dividing the set into 30% test and 70% train set (Kennard and Stone 1969).

Model building
Multiple linear regression technique was used in constructing a model in Version 8 of Material studio software. It was used to show the correlation between the dependent variables (pIC 50 ) and the independent variables as the 2D model descriptors (Abdulrahman et al. 2020a, b, c). The model is suited so that the total squared difference between the actual and predicted value of the set of anti-proliferative activity is diminished. In regression analysis, the reliant mean of the dependent variable, (pIC 50 ), relies on the independent descriptors (Abdullahi et al. 2020).

Validating the model (internal)
The prediction of the model built must be verified on a data set that was never used in building the model at first (Tropsha et al. 2003). The models obtained from the internal validation of the train test (sixty-one compounds) were evaluated using Friedman formula (Friedman 1991). d equals user-defined smoothing parameter, C equals the sum of model definitions, M equals the sum of derivatives on the train set, while p is the summation of model definition (Abdulrahman et al. 2020a, b, c). SEE is the standard estimated error. The smaller SEE is, the more enhanced the model would be.
The model from the regression structure becomes.
Y is the response variable, 'k' is the coefficient of regression corresponding to 'x's that are the model parameters which are the predictor variables, and n is the regression constant.
(R 2 ) is the regression coefficient, and it is the degree of fitness of the equation of regression. It signifies the variation section in the data that is described by regression. The nearer R 2 value is 0. 1, the better the regression fitness. R 2 is given as: where Y exp and Y pred are the biological and calculated activities of the train set. Y mintraining indicates the average pIC 50 of the train set molecules. Table 2 shows the accepted least required values in assessing a QSAR model (Ibrahim et al. 2018). (R 2 pred ) is the predicted validation parameter, and it is assessed by calculating the prediction power of the model. An (R 2 pred ) above 0.5 shows a high prediction power and robustness of the model.

Applicability domain of QSAR
Model validation should be within the training domain, and the performance of the compounds needs to be assessed within the domain to determine the fitness of the model. The applicability domain is evaluated using William's plot. The warning leverage is used in sorting the compounds within a particular space on the graph known as the applicability domain, compounds that fall within the space on the plot are the approved predicted compounds (Veerasamy et al. 2011). It is formulated as; where j equals the total model parameters and m is the total molecules of train sets.

Molecular docking studies
Some compounds with high pIC 50 were chosen for molecular docking studies with Maternal Embryonic Leucine Zipper Kinase (MELK) as protein target. The crystal structure was obtained from RCSB PDB (https:// www. rcsb. org) with the ID, 4BKY. The crystal structure was prepared with Discovery Studio Software and converted to PBD format. The binding affinity of the ligandprotein complex was calculated with AutoDock Vina of PyRX software employed in calculating (Abdulrahman et al. 2020a, b, c). Visualizer of Discovery Studio was used to understand the ligand-protein target interactions.

Computational pharmacokinetics (drug-likeness)
The SwissADME, a free web tool used in evaluating the pharmacokinetics, drug-likeness (physicochemical and ADME properties) and medicinal chemistry friendliness of small molecules (Daina et al. 2017), would be used in testing the drug-likeness of the newly designed compounds.  Furthermore, some physicochemical properties and positive controls of the designed compounds were checked using the on-line tool for their adaptability with Lipinski's rule of five (Hou et al. 2019). Lipinski and co-workers proposed the "Rule of Five" in 1997, which was the original and most known rule-based filter for drug-likeness of a molecule, distinguishing whether a molecule can be orally absorbed well or not, following the criteria: molecular weight (MW) ≤ 500, octanol/water partition coefficient (AlogP) ≤ 5, number of hydrogen bond donors (HBD's) ≤ 5 and number of hydrogen bond acceptors (HBAs) ≤ 10.6. According to the Rule of Five, a molecule would not be orally active if it violates two or more of the four rules (Guan et al. 2019).

QSAR of parthenolide derivatives
The goal of QSAR is to establish a model from the obtained descriptor that has a higher performance than the experimental values. In this research, parthenolide derivatives went through a quantitative structure-activity relationship with its actual activities. The Model The model best predicts the biological activities of parthenolide derivatives against the MDA-MB-231 cell line. The model also conforms to the least proposed requirement in QSAR modeling as indicated in Table 2.
Tables 7 and 8 reveal how the model was validated externally by carrying out calculations using the values of the descriptors obtained from the test set. The calculated external validation (R 2 pred ) was 0.5304, which agrees with the least requirements as shown in Table 6. Table 9 shows the experimental, calculated and residual values of the derivative compounds. The low residual values were calculated from the predicted activities. The low residual values show the potency of the built model. Table 10 defines the model parameters (descriptors) used in building the model, and the descriptors were used in verifying the model both internally and externally. The mean effect was calculated statistically, and it shows the collinearity of the model parameters and the activities of the derivative compounds in the built model. Table 11 shows correlation between the descriptors generated from the model (VIF) and the P values of the descriptors. The VIF was calculated using the equation; X 2 is the coefficient of correlation (Myers 1990).
The P values estimate the statistical difference between the model parameters at 95% confidence level.
A straight line graph was obtained from plotting the activities (predicted activity against experimental activity) of both the test set and train set of the derivative compounds as shown in Fig. 2. The anti-proliferative activities showed a good connection as shown on the plot.   A graph of standardized residual versus biological activity of the entire data set was plotted as shown in Fig. 3. The compounds of both the test and train set spread on both sides of the y-axis, defining the potency of the model.
The applicability domain was evaluated using the William's plot. The William's plot is a graph of standardized residual against leverages, as shown in Fig. 4. The warning leverage is used in sorting the compounds within a particular space, and the compounds that fell within the space on the plot were the actual predicted compounds. The warning leverage was found to be (h* = 0.47).

Molecular docking results
Molecular docking interaction on compounds of parthenolide derivatives with the protein target, Maternal Embryonic Leucine Zipper Kinase (MELK), was performed. The molecular docking results are summarized in Table 12. The binding affinity was calculated using PyRX software, while Discovery Studio Software was used in understanding and visualizing the various interaction formed between the ligand (derivative compounds) and the binding pose of the receptor (MELK). Figure 5 shows the prepared receptor and ligand that was used in the docking studies, Figs. 6 and 7 show the 2D interaction of the complexes 9 and 2, while Fig. 8 shows the 3D interaction of the prepared ligand 9 and 2 to form complexes.

Ligand base drug design
Fifteen (15) new compounds new parthenolide derivative compounds were designed using the ligand-based approach. The ligand-based design used the derived mathematical model obtained from the QSAR studies,  and adjustment was made on the lead compounds (1 and 13) based on the definition of the molecular descriptors obtained from the model. Table 13 shows the newly designed compounds with their new calculated activities.

Physicochemical and ADME properties (pharmacokinetics) of designed parthenolide compounds
A compound with poor drug-likeness and poor ADMET properties will not be allowed to progress into pre-clinical research, regardless of the high anti-proliferative activities. ADME properties are one of the main pipelines in drug discovery, for drug-likeliness of a molecules to be assessed, and it must pass through the pipeline of drug discovery. All the newly designed compounds were assessed for their drug-likeliness (pharmacokinetics analysis) as shown in Table 14.

QSAR of parthenolide derivatives
From the statistical parameters, the squared coefficient of correlation (R 2 ) was 0.6715, correlation coefficient adjusted squared (R 2 adj ) was 0.61920, crossvalidation coefficient (Q 2 ) was 0.5460, and external validation (R 2 pred ) was 0.5304. How the external validation of the model was verified is shown in Tables 7 and 8, which also conforms to the least requirement in QSAR modeling as shown in Table 6. The actual, calculated and residual values of parthenolide derivative compounds are shown in Table 9. The low residual value is the outcome between the difference between actual and predicted activities showing the high performance of the model. The model certified both internal and external parameters, thereby confirming the model to have high performance, very stable and robust.
The model's parameters are defined in Table 10 (names, definition and class). The mean effect (Table 10) shows the contribution of each descriptor in the constructed model, and the positive coefficient and values of the descriptor from the mean effect showed AAts6i has more impact followed by GATS5s, ATSC8i and VR1_Dze, therefore increasing the positive effect of the descriptors would increase the biological activities of parthenolide derivatives, while C2SP3 carries a negative effect and gives the least contribution in the model, reducing its negative effect would also contribute in increasing biological activities of parthenolide derivatives as proven in Tables 7 and 8. There was no much inter-correlation between the model parameters from the statistical analysis of Variance Inflation Factor (VIF), making the model highly stable. The null hypothesis shows no significant connection amid the bioactivity and model parameters of the constructed model at p > 0.05. At a 95% confidence level, the P values of the model parameters were below 0.05. Therefore, the null hypothesis is rejected and the alternative hypothesis is accepted. This indicates that there is a co-linearity between the bioactivity and model parameters of the constructed model, as shown in Table 11. Figure 2 shows a plot of actual activities against the predicted activities of both the test set and the train set of compounds. The plot showed that the predicted activity was in good agreement with its experimental values as shown in Table 6, conforming to the effectiveness and strength of the built model. Figure 3 shows how the entire molecules spread on both negative and positive sides of zero point on the y-axis of the plot, indicating no systematic errors between the standardized residual versus inhibition concentration (experimental activity). Figure 4 shows the standardized residuals against the leverage values also called William's plot. The better part of the derivatives fell within the calculated leverage of (h = 0.47); except 8 compounds we found to be outside the warning leverage which might be due to a slight difference in structure compared to other compounds in the data set.

Molecular docking studies
Molecular docking interaction on compounds of parthenolide derivatives with the protein target, Maternal Embryonic Leucine Zipper Kinase (MELK), was performed. Compounds 2, 9 and 17 were chosen because of their high binging score. Compounds 2, 9 and 17 had the highest docking score of − 9.3, − 9.3 and − 8.9 kcal/mol as shown in Table 12. The visual examination of docked complexes was done by assessing the hydrogen bond     Page 15 of 20 Lawal et al. Bull Natl Res Cent (2021) 45:90 interaction, hydrogen bond length and the hydrophobic interaction as shown in Table 12.
Compound 9 showed backbone conventional hydrogen bonds (2.74925, 2.4619, 2.98056, 3.34925, 3.35524 and 3.59598 A 0 ) with THR19, GLY21, ARG53, CYS89, GLY18 and ILE17 of the protein target site. Furthermore, it formed hydrophobic bonds (5.96743, 5.00307, 5.43404, 4.42983, 3.89836, 5.15308, 4.67658 and 4.44028 A 0 ) with PHE22, ILE17, VAL25, LEU139, ILE17, PHE22, TYR88 and ALA23 of the target site. It formed a halogen and electrostatic bond (3.33001 and 4.20988 A 0 ) with amino acids of ASP150. The carbonyl group of the compound acts as a hydrogen acceptor and formed one hydrogen bond with THR19, while two of the fluorine (Halogens) attached to the phenyl group of the compound, act as a hydrogen donor and formed two hydrogen bonds with GLY21 and ARG53 of the protein crystal.
Compound 2 also showed four hydrogen bonding (1.91438, 3.50504, 3.40604 and 3.35007 A 0 ) with LYS134, GLY18, ASP150 and THR19 of the active site of the Compound 17 also formed hydrogen and hydrophobic bonds with the protein receptor at different bond distances as shown in Table 12. The hydrogen and hydrophobic interactions show that the ligands have high potency against the MELK receptor. The prepared receptor and ligand are shown in Fig. 5. The 3D and 2D representations of the binding pose are shown in Figs. 6 and 7, and how the ligands bind firmly to the active sites  of the receptor and also a detailed interaction to form complexes.

Ligand-based drug design
Compounds 13 and 1 were chosen as lead compounds in the design of fifteen (15) new parthenolide derivative compounds due to their high predicted activity and low residual values as shown in Table 7. From the mean effect of the descriptors, AATS6i had a greater positive impact, while VR1_Dze had the least positive impact on the model. According to AATS6i (Centered Broto-Moreau autocorrelation-lag 6/weighted by first ionization potential) and VR1_Dze (Randic-like eigenvector-based index from Barysz matrix/weighted by Sanderson electronegativities) descriptors, adding more ionization potential and electronegative elements would increase the potency of the chosen templates. The structural modification occurred by adding more ionization potential and electronegative elements to template molecules (compounds 1 and 13). The predicted activities of the newly designed compounds were higher than that of the template molecules as shown in Table 13.

Physicochemical and ADME properties (pharmacokinetics) of designed parthenolide compounds
All the newly designed compounds were assessed for their drug-likeliness (ADME and physicochemical properties). None of the designed compounds violated two rules out of the Lipinski rule of five, a prominent principle used in certifying the drug-likeness Page 18 of 20 Lawal et al. Bull Natl Res Cent (2021) 45:90 of a compound, and this shows that all the designed compounds passed the drug-likeness test as shown in Table 14, making the compounds a breakthrough in finding the cure to triple-negative breast cancer. Figure 9 shows the bioavailability RADAR of molecules 2, 5 and 8. The bioavailability RADAR enables a first glance at the drug-likeness of a molecule. The pink area signifies the ideal range for each properties (lipophilicity: XLOGP3 between − 0.7 and + 5.0, size: MW between 150 and 500 g/mol, polarity: TPSA between 20 and 130 Å 2 , solubility: log S not higher than 6, saturation: fraction of carbons in the sp 3 hybridization not less than 0.25 and flexibility: no more than 9 rotatable bonds) (Daina et al. 2017). The GI-gastrointestinal absorption was also high for all the designed molecules.

Conclusion
The results obtained from the QSAR mathematical model of parthenolide derivatives were used in designing new derivatives compounds that were more effective and potent. The molecular docking result of parthenolide derivatives showed that compounds 2, 9 and 17 had higher docking scores than the standard drug adriamycin. The compounds would serve as the most promising inhibitors (MELK). Furthermore, the pharmacokinetics analysis carried out on the newly designed compounds revealed that all the compounds can move on to the next step of pre-clinical trial because they passed drug-likeness test (ADME and other physicochemical properties) and they also adhered to the Lipinski rule of five: a major principle used in analyzing the drug-likeness of small compounds. This gives a great breakthrough in medicine in finding the cure to triplenegative breast cancer (MBA-MD-231 cell line). Lawal et al. Bull Natl Res Cent (2021)  Page 20 of 20 Lawal et al. Bull Natl Res Cent (2021) 45:90 Abbreviations ADME properties: Absorption, distribution, metabolism and elimination properties; MELK: Maternal Embryonic Leucine Zipper Kinase; QSAR: Quantitative structure-activity relationship; TNBCs: Triple-negative breast cancers.