Clinical methodology
The sample was assayed as described by (Muhammad Ghali et al. 2020). Briefly, an automated COBAS E411analyser was applied to analyse liver function enzymes. Thyroid hormones were assessed using Elecsys COBAS E411 after centrifuging it for 10 min at 2000g in a red-capped tube. The samples were separated into Control groups, hyperthyroidism and hypothyroidism.
Proposed data computational intelligence approach
Different data-driven approaches were used separately in this research to propose a model for developing an AI diagnostic approach for TSH. This current study is data-driven, collecting the data from our previous research (Ghali et al. 2020). The thyroidism status was predicted using two parameters that serve as inputs, i.e. liver enzymes alanine transaminase (ALT), aspartate transaminase (AST), albumin blood test (ALB), gamma-glutamyl transferase (GGT), alkaline phosphatase (ALP), direct bilirubin (DBIL), total bilirubin (TBIL), and hormones such as thyroid-stimulating hormone (TSH), triiodothyronine (T3), thyroxine (T4), free triiodothyronine (FT3), and free thyroxine (FT4). The liver enzyme parameters were used to predict thyroidism status using different AI models, considering that liver is the major organ that synthesizes thyroid-binding globulin, prealbumin, and albumin that binds to thyroid hormone in the peripheral circulation and the liver metabolizes thyroid hormone. Therefore, to obtain accurate predictions of the best performing AI model, liver function enzyme parameters are likely to give a clear picture of the thyroidism status since the liver plays a crucial role in thyroid disease conditions.
Furthermore, three ensemble techniques (MLP, SVM, and HW) were applied to boost the prediction accuracy of the thyroidism status by combining the output results of the single models. Practically, concluding on a single model that can outperform other existing models used in predicting various parameters for a specific study is not feasible for the predictors. The proposed method in this research involves the determination of thyroid hormone status using two liver enzymes and hormones by selecting an ensemble of different models.
Hammerstein–Weiner model (HW)
The Hammerstein–Weiner (HW) model involves utilizing a black-box discovery model methodology planned to decide the nonlinear framework (Gaya et al. 2017). The HW model's arrangement consists of three blocks: a static input nonlinear block, a static output nonlinear, and the linear dynamic block, as shown in Fig. 1 (Abba et al. 2019). The model adopts the nonlinear input to linear function blocks, then returns to nonlinear functions in the output structure. Furthermore, the HW model displays more precise comprehension of the nonlinear and linear system connection than the other standard ANNs (Abba et al. 2019). Mainly, the MATLAB toolbox was utilized to improve the HW model based on its structure. The piecewise linear functions are input and the output nonlinearity predictors, while 10 is set as default for the number of units, although the complexity of the model increases as the number of units gets more extensive (Guo 2004).
Multi-layer perceptron (MLP) neural network
Multi-layer perceptron (MLP) is considered among the frequent exemplars of ANN, which carry a nonlinear system and work as a widespread approximator (Choubin et al. 2016). The structure of the MLP consists of output, input, and hidden layer, unlike the other ordinary ANNs (Kim and Singh 2014; Pham et al. 2019). The input layer nodes are mainly linked to the hidden and output layers. From the input to the output layer, the signals are processed and afterwards transmitted across the assist of biases and weights by sequential mathematical operations. The Levenberg–Marquardt algorithm is a learning algorithm that is mainly used to improve the inaccuracy among the measured and predicted values. The training algorithms are constantly replicated till the required outcomes are pointed out. The MLP structure consists of input, output, and one or more hidden layers, similar to the ordinary ANNs, shown in Fig. 2 (Committee 2000).
$${y}_{i}=\sum_{j=1}^{N}{w}_{ji}{x}_{j}+{w}_{i0}$$
(1)
N is defined as the total number of nodes in the top layer of the node, i; wji is the weight between the nodes i and j in the upper layer; xj defines the output derived from node j; wi0 is the bias in node i, and yi defines the input signal of node i which crosses via the transfer function.
Support vector machine (SVM)
The concept of learning in the theme of support vector machine (SVM) was suggested by Vapnik in 1995, which supplies the wanted machine for problem-solving that include prediction, classification, regression, and pattern recognition. The SVM works and consists of a data-driven model. The two prominent roles of SVM include statistical learning theory and structural risk minimization. SVM's ability to boost the overall efficacy of the model and decrease the error, excess of data, and sophistication as well the development in the overall efficacy of the system, makes it superior to ANN (Vapnik 1995).
SVM can be categorized into linear support vector regression and nonlinear support vector regression. This indicates that support vector regression (SVR) is estimated as a category of SVM according to the two primary structural layers; the kernel function weighting on the input variable as the first layer, and the second function is a kernel outputs weighted sum as demonstrated in Fig. 3. The linear regression adapts on the data; afterwards, the outputs pass over the nonlinear kernel to capture the nonlinear model of the data.
Ensemble techniques
The AI-based models for the same inputs provide diverse performance levels regarding their robustness or limitation. Hence, many methods such as web ranking algorithm, classification, regression problems, and time series clustering are used in different study fields (B and Sadaoui 2019; Baba et al. 2015a; Dehghanian et al. 2015; Loos et al. 2019). Ensemble learning is the collective term for branch machine learning that deal with homogenous or heterogenous multiple models. The ensemble machine learning method is usually engaged by joining the process of various predictors to boost the performance of a single AI model. Machine learning has been demonstrated to be exceptionally successful in creating exact outcomes compared to single models applied for tackling a similar issue. For developing the expected performance of this model, three procedures are commonly utilized: (1) simple averaging ensemble (SAE) for combining the HW, MLP, and SVM predictors, (2) neural network ensemble (NNE), and (3) weighted averaging ensemble (WAE) (Baba et al. 2015b).
Simple averaging ensemble (SAE)
For SAE, the SVM, HW, and MLP single models are first prepared and tried independently, the average of the MLP, SVM, and HW are analysed and tested against the noticed qualities where the overall formula for SAE is given as:
$${P}_{(t)}= \frac{1}{N}\sum_{i=1}^{N}{p}_{i}(t)$$
(2)
N defines the number of learners (here N = 3), and pi represents the output of any single model (i.e. HW, SVM, and MLP) at a specific time t.
Weighted average ensemble (WAE)
Weighted average ensemble (WAE) can be resolved by allocating different weights to specific outputs of the single models indicated to their significance output, which is the opposite in the case of single models. The WAE is shown in the form of:
$${P}_{(t)}={\sum }_{i=1}^{N}{w}_{i}p(t)$$
(3)
where \(w_{i}\) represents the weight applied to the ith model's output and can be resolved based on the performance of the model as:
$$w_{i} = \frac{{DC_{i} }}{{\sum\nolimits_{i = 1}^{N} {DC_{i} } }}$$
(4)
DCi is the performance efficiency of the ith single model.
Neural network ensemble (NNE)
In the neural ensemble method (NNE), the nonlinear average is directed by training the different neural networks. The input layer of NNE is supported through outputs of the single models, by which everyone in the input layer is assigned to a single neuron. For network training, the backpropagation algorithm is used, by which the perfect structure and epoch number can be demonstrated utilizing the trial by error method for the ensemble network.
Data pre-processing and model validation
In computational intelligence models, the principal point is to guarantee any particular model or models utilized upon a given data collection and achieve agreeable predictions on obscure data collection (Nourani et al. 2018). The most frequent issue in prediction is overfitting, which results in the contradiction between testing performances and training. Different validation methods can be applied during the validation process, such as k-fold cross-validation and leave one out and holdout. The primary importance of k-fold and cross-validation is that at every round, the validation and training set are autonomous from each other (Usman et al. 2020). In our study, the k-fold cross-validation is used to adapt and reduce overfitting issues as demonstrated in Fig. 4.
Furthermore, the primer training data set is separated into same-sized subsets of k and typed from the k−1 data subsets for the validation process, while the remaining subsets are used for the training purpose (Elkiran et al. 2018). The result variations are considered as the average of validation efficiency of k-subsets. In general, k-values are calculated from sample availability, mainly 2–10. In the k-fold cross-validation process, the general advantages are that the calibration and validation set in every round are independent of one other to achieve a satisfying foundation of model optimization (Abba et al. 2017). The basic set of data is split into two groups; the verification and calibration set to achieve high performance of the data usage in model configuration (Soltani et al. 2015). Our study conducted data classification in two phases (25% for verification and 75% for calibration) to avoid the overfitting, underfitting, and local minima issues that may lead to qualitative and quantitative changes, as shown in Fig. 4 (Usman et al. 2021).
The performance accuracy is estimated from various criteria based on the differential between predicted and measured values. In our study, correlation coefficient (R), determination coefficient (R), and mean square error (MSE) were used to evaluate the models:
$${R}^{2}=1-\frac{\sum_{j=1}^{N}{\left[{(Y}_{obsi}-Y_{comi} )\right]}^{2}}{\sum_{j=1}^{N}{\left[{(Y}_{obsi}-\bar{Y}_obsi)\right]}^{2}}$$
(5)
$${\text{MSE}} = \frac{1}{N}\sum\limits_{(i = 1)}^{N} {(Y_{obsi} - Y_{comi} )^{2} }$$
(6)
$$R=\frac{\sum_{i=1}^{N}({Y}_{obsi}-{\overline{Y} }_{obsi})({Y}_{comi}-{\overline{Y} }_{comi})}{\sqrt{\sum_{i=1}^{N}{({Y}_{obsi}-{\overline{Y} }_{obsi})}^{2}}\sum_{i=1}^{N}{({Y}_{comi}-{\overline{Y} }_{comi})}^{2}}$$
(7)
where N = data number,\({Y}_{obsi}\)=observed data,\(\overline{Y }\)= average value, and \({Y}_{comi}\)= computed values.
According to Nourani et al. (2018) and Elkiran et al. (2019), for a good analysis of any data intelligence model, the efficiency performance should include at least one goodness-of-fit (e.g. R2) and at least one absolute error measure (e.g. RMSE). The employed three performance criteria in this study were attributed to the fact that multi-criteria indicator for measuring the models' performance was generally employed in contemporary studies. Another important reason for using multiple criteria is that the properties of data, such as normality, size, and linearity, affect the performance accuracy of any model, which can also be evaluated using these criteria. In addition, several studies have already shown that even for the same type of data set, the performance results may deviate from one model performance to another. For example, R2 does not consider any biases that might be present in the data. Therefore, a good model might have a low R2 value or a model that does not fit the data might have a high R2 value. Hence, other evaluation metrics can be combined with the goodness-of-fit (R2), such as the error measure root mean square (RMSE) and biases measure could lead to promising and reliable simulation. Other performance efficiency criteria can also be used, such as mean absolute relative error (MAE) (Nourani et al. 2018; Elkiran et al. 2018; Usman et al. 2021).