Simulation of liver function enzymes as determinants of thyroidism: a novel ensemble machine learning approach

Background: Hormone production by the thyroid gland is a prime aspect of maintaining body homeostasis. In this study, the ability of single artificial intelligence (AI)-based models, namely multi-layer perceptron (MLP), support vector machine (SVM), and Hammerstein–Weiner (HW) models, were used in the simulation of thyroidism status. The study’s primary aim is to unveil the best performing model for the simulation of thyroidism status using hepatic enzymes and hormones as the independent variables. Three statistical metrics were used in evaluating the performance of the models, namely determination coefficient (R), correlation coefficient (R), and mean squared error (MSE). Results: Considering the quantitative and visual presentation of the results obtained, it has been observed that the MLP model showed higher performance skills than SVM and HW, which improved their performances up to 3.77% and 12.54%, respectively, in the testing stages. Furthermore, to boost the performance of the single AI-based models, three different ensemble approaches were employed, including neural network ensemble (NNE), weighted average ensemble (WAE), and simple average ensemble (SAE). The quantitative predictive performance of the NNE technique boosts the performance of SAE and WAE approaches up to 2.85% and 1.22%, respectively, in the testing stage. Conclusions: Comparative performance of the ensemble techniques over the single models showed that NNE outperformed all the three AI-based models (MLP, SVM, and HW) and boosted their performance accuracy up to 7.44%, 11.212%, and 19.98%, respectively, in the testing stages. Keyword: Thyroidism, AI-based models, Ensemble machine learning, Liver function enzymes, Hormones © The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. Background Hormone production by the thyroid gland is a prime aspect of maintaining body homeostasis. However, the process must be tightly regulated to prevent a hormonal imbalance that negatively affects certain metabolic activities associated with many diseases (Ghali et al. 2020). The thyroid gland is part of the endocrine system, which produces thyroid hormones, namely triiodothyronine  (T3) and thyroxine (T4), responsible for hormonal regulation for proper functioning (Brent 2012). The thyroid-stimulating hormone produced by the pituitary gland is essential in regulating hormonal production by the thyroid gland (Jonklaas 2020). Clinically, an elevated level of TSH denotes underperformance of thyroid hormones; as such, the pituitary gland compensates accordingly by producing more TSH, a condition referred to as hypothyroidism. However, low TSH levels indicate excess production of thyroid hormone above normal. The pituitary hormone compensates accordingly by decreasing the TSH production to retain the thyroid Open Access Bulletin of the National Research Centre *Correspondence: kurya360@gmail.com 5 School of Life and Allied Health Sciences, Department of Biotechnology, Glocal University, Saharanpur, Uttar Pradesh 247121, India Full list of author information is available at the end of the article Page 2 of 10 Usman et al. Bulletin of the National Research Centre (2022) 46:73 function, a condition referred to as hyperthyroidism (Zhang et al. 2020). Considering that our research is aimed at simulation of thyroidism status using liver enzymes and thyroidstimulating hormone as an independent variable, we feel it is worthy to further elucidate the biological relevance and association of thyroid hormones and vitamin D with disease severity and liver functions in patients with noncholesteric chronic liver disease (Fisher and Fisher 2007). More so, thyroid hormones play a substantial role in the biological system. However, their abnormal level leads to several medical implications such as thyroid tumorigenesis, Hashimoto’s thyroiditis, goitre, myxoedema, benign tumour of the pituitary gland, peripheral neuropathy, as observed in many patients (Choi et al. 2021; Huang and Liaw 1995). Abnormalities in serum liver enzymes are observed in  hypothyroidism  and may be related to impaired lipid metabolism, hepatic steatosis, hypothyroidism-induced myopathy, hyperammonemia, ascites, and hepatitis c virus. (Piantanida et  al. 2020; Płudowski et  al. 2013). We believe that proposing the best performing AI and machine learning model using hepatic enzymes and hormones will serve as a breakthrough in drug design and discovery for several medical complications. Interestingly, machine learning tools were widely employed to extract patterns within patient data and predict the outcomes for improved clinical management of autoimmune thyroid diseases. Recently, a comparison of logistic regression and neural network models was conducted to diagnose thyroid disorders with better performance of the neural network model than multinomial logistic regression. However, the predictive performance of the two models was found to be disdainful to laboratory variables (Chowdhury and Chakraborty 2017). Another study that uses data set from the Graven Institute in Sydney, Australia, to predict thyroid disease using a machine learning approach shows that prediction and classification of any data depend on the data set. The algorithms used, such as SVM, Naïve Bayes, autoencoders, ANNs, and CNNs yield a better result (Chaubey et  al. 2020). A cross-sectional study of 143 LN patients with hyperthyroidism diagnosed by renal biopsy using PCA–logistic regression model shows that the model performed well in identifying important risk factors for specific clinical outcomes (Aguayo-Orozco et al. 2021). Association between hyperthyroidism and liver disease has been reported and further correlates hyperthyroidism’s severity as an independent risk factor for abnormal liver function enzymes (Piantanida et al. 2020). Furthermore, many subsequent case reports and series have highlighted the prevalence of liver test abnormalities in the setting of hyperthyroidism and several mechanisms of liver dysfunction in hyperthyroidism, including liver abnormalities due to hyperthyroidism alone, liver damage related to heart failure and hyperthyroidism, and concomitant liver disease in hyperthyroidism (Punekar et  al. 2018). Moreso, an experimental study involving the application of linear and nonlinear models has predicted the thyroid hormone level (TSH) using different macro-elements and vitamins as the corresponding input parameters (Muhammad Ghali et  al. 2020). In a clinical puzzle study, liver transplant evaluation was carefully analysed in a patient with coagulopathy, encephalopathy, and drug-induced acute liver failure. Uncontrolled thyroid diseases are reported to strongly correlate with liver dysfunction (Anugwom and Leventhal 2021). Unfortunately, none of the studies addresses the correlations and associations with artificial intelligence and machine learning using two independent variables. The major novelty of our work is the adequate use of machine learning algorithms to understand such complex clinical data better and propose the best AI algorithm that could model thyroidism status in different subjects using liver function enzymes and level of TSH as an input variable. Interestingly, this compelling evidence in our work suggested that the best performed model has tremendous potential in the future for further endocrinebased research, diagnosis, and treatment of several liver diseases. The principal aim of our work is to make a comparative analysis and propose the best performing model of AI for the simulation of thyroidism status using hepatic enzymes and hormones as the independent variables, considering the gaps and limitations in many published articles.

Page 2 of 10 Usman et al. Bulletin of the National Research Centre (2022) 46:73 function, a condition referred to as hyperthyroidism (Zhang et al. 2020).
Considering that our research is aimed at simulation of thyroidism status using liver enzymes and thyroidstimulating hormone as an independent variable, we feel it is worthy to further elucidate the biological relevance and association of thyroid hormones and vitamin D with disease severity and liver functions in patients with noncholesteric chronic liver disease (Fisher and Fisher 2007). More so, thyroid hormones play a substantial role in the biological system. However, their abnormal level leads to several medical implications such as thyroid tumorigenesis, Hashimoto's thyroiditis, goitre, myxoedema, benign tumour of the pituitary gland, peripheral neuropathy, as observed in many patients (Choi et al. 2021;Huang and Liaw 1995). Abnormalities in serum liver enzymes are observed in hypothyroidism and may be related to impaired lipid metabolism, hepatic steatosis, hypothyroidism-induced myopathy, hyperammonemia, ascites, and hepatitis c virus. (Piantanida et al. 2020;Płudowski et al. 2013). We believe that proposing the best performing AI and machine learning model using hepatic enzymes and hormones will serve as a breakthrough in drug design and discovery for several medical complications.
Interestingly, machine learning tools were widely employed to extract patterns within patient data and predict the outcomes for improved clinical management of autoimmune thyroid diseases. Recently, a comparison of logistic regression and neural network models was conducted to diagnose thyroid disorders with better performance of the neural network model than multinomial logistic regression. However, the predictive performance of the two models was found to be disdainful to laboratory variables (Chowdhury and Chakraborty 2017).
Another study that uses data set from the Graven Institute in Sydney, Australia, to predict thyroid disease using a machine learning approach shows that prediction and classification of any data depend on the data set. The algorithms used, such as SVM, Naïve Bayes, autoencoders, ANNs, and CNNs yield a better result (Chaubey et al. 2020). A cross-sectional study of 143 LN patients with hyperthyroidism diagnosed by renal biopsy using PCA-logistic regression model shows that the model performed well in identifying important risk factors for specific clinical outcomes (Aguayo-Orozco et al. 2021).
Association between hyperthyroidism and liver disease has been reported and further correlates hyperthyroidism's severity as an independent risk factor for abnormal liver function enzymes (Piantanida et al. 2020). Furthermore, many subsequent case reports and series have highlighted the prevalence of liver test abnormalities in the setting of hyperthyroidism and several mechanisms of liver dysfunction in hyperthyroidism, including liver abnormalities due to hyperthyroidism alone, liver damage related to heart failure and hyperthyroidism, and concomitant liver disease in hyperthyroidism (Punekar et al. 2018). Moreso, an experimental study involving the application of linear and nonlinear models has predicted the thyroid hormone level (TSH) using different macro-elements and vitamins as the corresponding input parameters (Muhammad Ghali et al. 2020). In a clinical puzzle study, liver transplant evaluation was carefully analysed in a patient with coagulopathy, encephalopathy, and drug-induced acute liver failure. Uncontrolled thyroid diseases are reported to strongly correlate with liver dysfunction (Anugwom and Leventhal 2021).
Unfortunately, none of the studies addresses the correlations and associations with artificial intelligence and machine learning using two independent variables. The major novelty of our work is the adequate use of machine learning algorithms to understand such complex clinical data better and propose the best AI algorithm that could model thyroidism status in different subjects using liver function enzymes and level of TSH as an input variable. Interestingly, this compelling evidence in our work suggested that the best performed model has tremendous potential in the future for further endocrinebased research, diagnosis, and treatment of several liver diseases.
The principal aim of our work is to make a comparative analysis and propose the best performing model of AI for the simulation of thyroidism status using hepatic enzymes and hormones as the independent variables, considering the gaps and limitations in many published articles.

Clinical methodology
The sample was assayed as described by (Muhammad Ghali et al. 2020). Briefly, an automated COBAS E411analyser was applied to analyse liver function enzymes. Thyroid hormones were assessed using Elecsys COBAS E411 after centrifuging it for 10 min at 2000g in a red-capped tube. The samples were separated into Control groups, hyperthyroidism and hypothyroidism.

Proposed data computational intelligence approach
Different data-driven approaches were used separately in this research to propose a model for developing an AI diagnostic approach for TSH. This current study is datadriven, collecting the data from our previous research (Ghali et al. 2020). The thyroidism status was predicted using two parameters that serve as inputs, i.e. liver enzymes alanine transaminase (ALT), aspartate transaminase (AST), albumin blood test (ALB), gamma-glutamyl transferase (GGT), alkaline phosphatase (ALP), direct bilirubin (DBIL), total bilirubin (TBIL), and hormones such as thyroid-stimulating hormone (TSH), triiodothyronine (T3), thyroxine (T4), free triiodothyronine (FT3), and free thyroxine (FT4). The liver enzyme parameters were used to predict thyroidism status using different AI models, considering that liver is the major organ that synthesizes thyroid-binding globulin, prealbumin, and albumin that binds to thyroid hormone in the peripheral circulation and the liver metabolizes thyroid hormone. Therefore, to obtain accurate predictions of the best performing AI model, liver function enzyme parameters are likely to give a clear picture of the thyroidism status since the liver plays a crucial role in thyroid disease conditions. Furthermore, three ensemble techniques (MLP, SVM, and HW) were applied to boost the prediction accuracy of the thyroidism status by combining the output results of the single models. Practically, concluding on a single model that can outperform other existing models used in predicting various parameters for a specific study is not feasible for the predictors. The proposed method in this research involves the determination of thyroid hormone status using two liver enzymes and hormones by selecting an ensemble of different models.

Hammerstein-Weiner model (HW)
The Hammerstein-Weiner (HW) model involves utilizing a black-box discovery model methodology planned to decide the nonlinear framework (Gaya et al. 2017). The HW model's arrangement consists of three blocks: a static input nonlinear block, a static output nonlinear, and the linear dynamic block, as shown in Fig. 1  . The model adopts the nonlinear input to linear function blocks, then returns to nonlinear functions in the output structure. Furthermore, the HW model displays more precise comprehension of the nonlinear and linear system connection than the other standard ANNs . Mainly, the MATLAB toolbox was utilized to improve the HW model based on its structure. The piecewise linear functions are input and the output nonlinearity predictors, while 10 is set as default for the number of units, although the complexity of the model increases as the number of units gets more extensive (Guo 2004).

Multi-layer perceptron (MLP) neural network
Multi-layer perceptron (MLP) is considered among the frequent exemplars of ANN, which carry a nonlinear system and work as a widespread approximator (Choubin et al. 2016). The structure of the MLP consists of output, input, and hidden layer, unlike the other ordinary ANNs (Kim and Singh 2014;Pham et al. 2019). The input layer nodes are mainly linked to the hidden and output layers. From the input to the output layer, the signals are processed and afterwards transmitted across the assist of biases and weights by sequential mathematical operations. The Levenberg-Marquardt algorithm is a learning algorithm that is mainly used to improve the inaccuracy among the measured and predicted values. The training algorithms are constantly replicated till the required outcomes are pointed out. The MLP structure consists of input, output, and one or more hidden layers, similar to the ordinary ANNs, shown in Fig. 2

(Committee 2000).
N is defined as the total number of nodes in the top layer of the node, i; w ji is the weight between the nodes i and j in the upper layer; x j defines the output derived from node j; w i0 is the bias in node i, and y i defines the input signal of node i which crosses via the transfer function.

Support vector machine (SVM)
The concept of learning in the theme of support vector machine (SVM) was suggested by Vapnik in 1995, which supplies the wanted machine for problem-solving that include prediction, classification, regression, and pattern recognition. The SVM works and consists of a datadriven model. The two prominent roles of SVM include statistical learning theory and structural risk minimization. SVM's ability to boost the overall efficacy of the model and decrease the error, excess of data, and sophistication as well the development in the overall efficacy of the system, makes it superior to ANN (Vapnik 1995). SVM can be categorized into linear support vector regression and nonlinear support vector regression. This indicates that support vector regression (SVR) is estimated as a category of SVM according to the two primary structural layers; the kernel function weighting on the input variable as the first layer, and the second function is a kernel outputs weighted sum as demonstrated in Fig. 3. The linear regression adapts on the data; afterwards, the outputs pass over the nonlinear kernel to capture the nonlinear model of the data.

Ensemble techniques
The AI-based models for the same inputs provide diverse performance levels regarding their robustness or limitation. Hence, many methods such as web ranking algorithm, classification, regression problems, and time series clustering are used in different study fields (B and Sadaoui 2019; Baba et al. 2015a;Dehghanian et al. 2015;Loos et al. 2019). Ensemble learning is the collective term for branch machine learning that deal with homogenous or heterogenous multiple models. The ensemble machine learning method is usually engaged by joining the process of various predictors to boost the performance of a single AI model. Machine learning has been demonstrated to be exceptionally successful in creating exact outcomes compared to single models applied for tackling a similar issue. For developing the expected performance of this model, three procedures are commonly utilized: (1) simple averaging ensemble (SAE) for combining the HW, MLP, and SVM predictors, (2) neural network ensemble (NNE), and (3) weighted averaging ensemble (WAE) (Baba et al. 2015b).

Simple averaging ensemble (SAE)
For SAE, the SVM, HW, and MLP single models are first prepared and tried independently, the average of the MLP, SVM, and HW are analysed and tested against the noticed qualities where the overall formula for SAE is given as: N defines the number of learners (here N = 3), and p i represents the output of any single model (i.e. HW, SVM, and MLP) at a specific time t.

Weighted average ensemble (WAE)
Weighted average ensemble (WAE) can be resolved by allocating different weights to specific outputs of the single models indicated to their significance output, which is the opposite in the case of single models. The WAE is shown in the form of: where w i represents the weight applied to the ith model's output and can be resolved based on the performance of the model as: (2)

Neural network ensemble (NNE)
In the neural ensemble method (NNE), the nonlinear average is directed by training the different neural networks. The input layer of NNE is supported through outputs of the single models, by which everyone in the input layer is assigned to a single neuron. For network training, the backpropagation algorithm is used, by which the perfect structure and epoch number can be demonstrated utilizing the trial by error method for the ensemble network.

Data pre-processing and model validation
In computational intelligence models, the principal point is to guarantee any particular model or models utilized upon a given data collection and achieve agreeable predictions on obscure data collection .
The most frequent issue in prediction is overfitting, which results in the contradiction between testing performances and training. Different validation methods can be applied during the validation process, such as k-fold cross-validation and leave one out and holdout. The primary importance of k-fold and cross-validation is that at every round, the validation and training set are autonomous from each other ). In our study, the k-fold cross-validation is used to adapt and reduce overfitting issues as demonstrated in Fig. 4. Furthermore, the primer training data set is separated into same-sized subsets of k and typed from the k−1 data subsets for the validation process, while the remaining subsets are used for the training purpose . The result variations are considered as the average of validation efficiency of k-subsets. In general, k-values are calculated from sample availability, mainly In the k-fold cross-validation process, the general advantages are that the calibration and validation set in every round are independent of one other to achieve a satisfying foundation of model optimization (Abba et al. 2017). The basic set of data is split into two groups; the verification and calibration set to achieve high performance of the data usage in model configuration (Soltani et al. 2015). Our study conducted data classification in two phases (25% for verification and 75% for calibration) to avoid the overfitting, underfitting, and local minima issues that may lead to qualitative and quantitative changes, as shown in Fig. 4 (Usman et al. 2021). The performance accuracy is estimated from various criteria based on the differential between predicted and measured values. In our study, correlation coefficient (R), determination coefficient (R), and mean square error (MSE) were used to evaluate the models: where N = data number,Y obsi =observed data,Y = average value, and Y comi = computed values.
According to Nourani et al. (2018) and Elkiran et al. (2019), for a good analysis of any data intelligence model, the efficiency performance should include at least one goodness-of-fit (e.g. R 2 ) and at least one absolute error measure (e.g. RMSE). The employed three performance criteria in this study were attributed to the fact that multi-criteria indicator for measuring the models' performance was generally employed in contemporary studies. Another important reason for using multiple criteria is that the properties of data, such as normality, size, and linearity, affect the performance accuracy of any model, which can also be evaluated using these criteria. In addition, several studies have already shown that even for the same type of data set, the performance results may deviate from one model performance to another. For example, R 2 does not consider any biases that might be present in the data. Therefore, a good model might have a low R 2 value or a model that does not fit the data might have a high R 2 value. Hence, other evaluation metrics can be combined with the goodness-of-fit (R 2 ), such as the error measure root mean square (RMSE) and biases measure could lead to promising and reliable simulation. Other performance efficiency criteria can also be used, such as mean absolute relative error (MAE) Elkiran et al. 2018;Usman et al. 2021).

Results
The efficiency of the AI-based models was checked by comparing the predicted and experimental values regarding the thyroid status of the subjects. The performance skills of the techniques used in the determination were estimated using three different statistical metrics (R 2 , MSE, and R). Furthermore, visual presentation of the data via scatter plots, radar plots, bar charts, and response plots was equally illustrated. The obtained results are shown in the following section.
Performance of the single AI-based models and ensemble techniques in both the training and testing stages in terms of the statistical evaluation metrics are shown in Tables 1 and 2. The tables equally demonstrate the quantitative comparative performance of the models towards simulating the dependent variable inform of thyroidism status using various liver function enzymes and hormones as the independent variables.
The result for the comparative performance of the models is equally shown in Fig. 5 using a scatter plot to recommend the best performing algorithm in both the training and testing stages.
The errors depicted by each model is demonstrated comparatively in Fig. 6 to visualize the performance results of the models; this is in line with the quantitative performance result shown in Table 1.
Furthermore, the NNE technique boosted the performance accuracy of the single AI models MLP, SVM,

Discussion
Spearman correlation analysis shown in Table 3   a weak inverse correlation between the dependent variable and D.BIL. The basic descriptive statistics equally demonstrate the nature of the raw data before the data normalization and the modelling, which will aid in simplifying the approach to be employed during the simulation. More information regarding correlation analysis can be found in Asnake Metekia et al. 2021). Based on the performance of the models, it can be observed that MLP outperformed both SVM and HW models in both the training and testing stages. Furthermore, the quantitative skills of MLP showed it could improve the performance ability of HW SVM and HW up to 3.77% and 12.54, respectively, in the testing stages.
Based on the quantitative performance of the ensemble techniques, it can be noted that all three approaches were able to predict the dependent variable with high performance in both the training and testing stages. Moreover, the NNE method outperformed the other two techniques (SAE and WAE) in the training and testing phases. Moreso, the quantitative predictive performance of the NNE technique can boost the performance of SAE and WAE approaches up to 2.85% and 1.22%, respectively, in the testing stage.
The comparative performance of the ensemble techniques can be presented visually using a radar plot using the correlation coefficient of the ensemble techniques in both the training and testing stages. Based on this plot, it can be noted that the NNE approach outperformed SAE and WAE techniques in both the training and testing stages, as shown in Table 2.
Furthermore, other approaches such as the hybrid data intelligence techniques that involve the coupling of both the linear and nonlinear models to enjoy the benefits of both properties towards enhancing the performance skills of the single models as well as the implementation of the emerging and newest metaheuristic approaches such as the Harris Hawks optimization method and the novel neuro-emotional technique can be used to improve and boost the performance efficiency of the single models.

Conclusions
The results obtained from the data-driven approaches using experimental data indicated that MLP, SVM, and HW AI models could model the thyroidism status of each subject either as normal, hypothyroidism, or hyperthyroidism with R 2 -values higher than 0.7000 in both the training and testing stages. Interestingly, NNE model outperformed all the three AI-based models (MLP, SVM, and HW) and boosted their performance accuracy up to 7.44%, 11.212%, and 19.98%, respectively, in the testing stages, which makes it the most recommended and promising model for further endocrine-based research. Therefore, our study serves as a breakthrough and calls for the application of the proposed model in drug design and discovery to reduce the morbidity and mortality rate of hepatocytes and thyroid diseases.
Abbreviations AI: Artificial intelligence; ALT: Alanine transaminase; AST: Aspartate transaminase; ALB: Albumin blood test; ALP: Alkaline phosphatase; DBIL: Direct bilirubin; ECF: Extracellular fluid; FT 3 : Free triiodothyronine; FT 4 : Free thyroxine; GGT : Gamma-glutamyl transferase; HW: Hammerstein-Weiner; MSE: Mean squared error; MLP: Multi-layer perceptron; NNE: Neural network ensemble; PTH: Parathyroid hormone; R: Correlation coefficient; R 2 : Determination coefficient; SAE: Simple average ensemble; SVM: Support vector machine; TSH: Thyroidstimulating hormone; TBIL: Total bilirubin; T 3 : Triiodothyronine; TSH: Thyroidstimulating hormone; T 4 : Thyroxine; WAE: Weighted average ensemble. AGU and UMG contributed to the conceptualization, analysis and writing of the manuscript. MAA, SMM and EH contributed with the proofreading. AUK and QH contributed to writing the manuscript. SI helps towards proofreading and major revisions of the manuscript, while SIA supervised and drafted the manuscript. All the authors participated in the proofreading and approval of the manuscript.

Funding
Not applicable.