Predictive modelling of thermal conductivity in single-material nanofluids: a novel approach

This research introduces a novel approach for modelling single-material nanofluids, considering the constituents and characteristics of the fluids under investigation. The primary focus of this study was to develop models for predicting the thermal conductivity of nanofluids using a range of machine learning algorithms, including ensembles, trees, neural networks, linear regression, Gaussian process regressors, and support vector machines. To identify the most relevant features for accurate thermal conductivity prediction, the study compared the performance of established feature selection algorithms, such as minimum redundancy maximum relevance, Ftest, and RReliefF, a newly proposed feature selection algorithm. The novel algorithm eliminated features lacking direct implications for fluid thermal conductivity. The selected features included temperature as a thermal property of the fluid itself, multiphase features such as volume fraction and particle size, and material features including nanoparticle material and base fluid material, which could be fixed based on any two intensive properties. Statistical methods were employed to select the features accordingly. The results demonstrated that the novel feature selection algorithm outperformed the established approaches in predicting the thermal conductivity of nanofluids. The models were evaluated using fivefold cross-validation, and the best model was the model based on the proposed feature selection algorithm that exhibited a root-mean-squared error of validation of 1.83 and an R-squared value of 0.94 on validation set. The model achieved a root-mean-squared error of 1.46 and an R-squared value of 0.97 for the test set. The developed predictive model holds practical significance by enabling nanofluids' numerical study and optimisation before their creation. This model facilitates the customisation of conventional fluids to attain desired fluid properties, particularly their thermal properties. Additionally, the model permits the exploration of numerous nanofluid variations based on permutations of their features. Consequently, this research contributes valuable insights to the design and optimisation of nanofluid systems, advancing our understanding and application of thermal conductivity in nanofluids and introducing a novel and methodological approach for feature selection in machine learning.

Page 2 of 15 Onyiriuka Bulletin of the National Research Centre (2023) 47:140 This approach is unusual as much research depends on statistical tools to select its learning features (MathWorks 2022).

Literature review
The prediction of the thermal conductivity of nanofluids has been studied extensively.The following reviews give the state of the art on this topic as given by various researchers, beginning with some historical studies to present works.Xie et al. (2002) studied the thermal conductivity measurement of SiC suspension in water and ethylene glycol and the effect of the size and shape of the added solid phase on the enhancement of thermal conductivity.Experimental data for SiC nanoparticles in water and ethylene glycol were presented.The thermal conductivity of SiC nanofluid was measured using a transient hotwire method.The effects of the morphologies (size and shape) of the added solid phase on the enhancement of the thermal conductivity of the nanoparticle suspension were studied.This study was one of the first to supply such data.Furthermore, it was one of the first to report the effects of morphology on thermal conductivity enhancement.It highlighted the deviation in the existing Hamilton-Crosser model with spherical and cylindrical assumptions.The study considers just one type of nanoparticle (SiC).However, only two particle sizes were considered.In the study, it was assumed that heat transfer between the particles and fluid takes place at the particle-surface interface.Heat transfer is expected to be more efficient and rapid for a system with a larger interfacial area.As the particle sizes decrease, the effective thermal conductivity of the suspension improves.Higher thermal conductivities were obtained by adding SiC nanoparticles.Furthermore, it was observed in the study that a linear relationship existed between low volume fraction in the (1-5%) volume fraction range and the thermal conductivity enhancements.Murshed et al. (2005) studied the thermal conductivity of TiO 2 water nanofluid in their paper.A more convenient measurement of the thermal conductivity of nanofluids was created-A transient hot-wire apparatus with an integrated correlation model.A relationship was established between particle volume fraction, shape, and thermal conductivity.The study focused on conveniently measuring nanofluids' thermal conductivity and comparing results with the theoretical prediction.The study was one of the first to collect and compare such data with theoretical models.However, only one type of nanofluid was used.They pointed out that traditional models fail due to a lack of accounting of the effects of: (1) particle size, (2) particle Brownian motion, (3) nano-layering, (4) effects of nanoparticle clustering-an integrated correlation model allowed for a more precise and convenient measurement of the thermal conductivity of nanofluids.Further efforts to develop a suitable model to predict the thermal conductivity of nanofluids will consider other factors that are important in enhancing the heat transfer performance of nanofluids.Komeilibirjandi et al. (2020) studied the thermal conductivity of nanofluids containing two nanoparticles and predicted it by using correlation and artificial neural network.The GMDH (Group method of data handling) Neural network was applied to model the thermal conductivity of CuO-nanofluids.Water and ethylene glycol were the base fluids.% volume fraction, nanoparticle size, temperature, and thermal conductivity of the base fluid were considered.Data used were extracted from experimental studies in the literature.
It is worth knowing that most researchers, as outlined above, have attempted this modelling.Furthermore, researchers that attempt the generalised model form have fixed their models to only the nanofluid types collected.Implying no other nanofluid type outside their collected data can be accounted for.Ramezanizadeh et al. (2019) mentioned the two types of nanofluids: conventional or single-material nanofluids and hybrid nanofluids.They reviewed proposed models for predicting the thermal conductivity of various researchers.The following conclusions can be drawn from their report (Ramezanizadeh et al. 2019): (a) The reviewed models were not tested with out-ofsample data.(b) Many models were made for a specific nanofluid (meaning they only covered one nanoparticle and base fluid type).The percentage of such models was 89% (23) out of all 26 models reviewed.(c) The remaining three (3) models were designed to cover more than one nanofluid.However, they were limited in the number of nanofluids on which they could make predictions due to the numerous nanofluid types that exist in the literature plus those that can be fabricated.A further study of the modelling approach used by these researchers reveals that a shift in convention in the choice of model features might solve this problem.For example, one researcher Ahmadloo and Azizi (2016) considered numbers that would differentiate each nanoparticle type and base fluid.Although this helped add a distinct factor to the conventional inputs of particle size, volume fraction, and temperature, the resulting model was still limited to 15 nanofluid types and could not be applied beyond those nanofluids.Also, adding ordinate numbers as opposed to encoding (one-hot types) has been shown in machine learn-ing to bias models by making those with higher numbers more critical.(d) For the final two models of the three in (c).They could not distinguish between nanofluid types due to the features they selected, so they were only accurate in a limited range of parameters and thus not useful outside of those ranges.(e) The models' focus was curve fitting, not prediction.
The other group of models studied by Ramezanizadeh et al. (2019) are the correlation types with low accuracy and a narrow range where they hold; hence, they are usually avoided.
In this study, the predictors used as input were chosen so that they uniquely represented the nanoparticle and base fluid data and could also apply to other nanoparticles and base fluids not available in the collected data.This ensures that it can be used to make predictions based on the numerical values of the predictors only and hence be a more general model.As compared to the work of other researchers such as Ahmadloo and Azizi (2016), as mentioned above, used predictors that were only uniquely identified with the nanoparticle and base fluid in the collected data; hence, they could not be used on a general basis for predicting the thermal conductivity of single material nanofluids not included in the collected data.Moreover, our approach in this study is to create a generalised model that accounts for all single-material nanofluids using a novel feature selection algorithm.

Experimental measurements and description of the experimental setup and procedures
Data used in this study were obtained from experimental data reported in the following articles (Patel et al. 2010): The report's experimental set-up for measuring thermal conductivity utilised a transient hot-wire apparatus.The measurement cell consisted of a 15-cm-long platinum wire with a diameter of 100 μm.The platinum wire served both as a heater and a thermometer.It was placed in a glass container filled with the test liquid and formed an arm of a Wheatstone bridge.An analytical solution for the temperature distribution was employed to determine the thermal conductivity of the test liquid.This solution assumes an infinitely long line heat source continuously heating a semi-infinite medium.The platinum wire was electrically insulated to prevent interference.The validity of the measurement technique was established by comparing the obtained thermal conductivity values with literature values for various fluids such as water, ethylene glycol, transformer oil, xylene, and toluene.The results showed that the measurements obtained from the transient hot-wire apparatus were within 1.2% of the literature values, indicating its reliability.However, it should be noted that this equipment is not suitable for measuring the thermal conductivity of fluids with high electrical conductivities.Nonetheless, it proved effective for measuring the thermal conductivity of oxide nanofluids, which was the focus of their study.Overall, the transient hotwire equipment employed in the study provided a robust and validated method for measuring thermal conductivity, ensuring accurate and reliable data for the analysis of nanofluids.

Machine learning techniques
Machine learning (ML) techniques (Ewim et al. 2020(Ewim et al. , 2021;;Géron 2022;Jiang et al. 2020;Meng et al. 2020;Sharma et al. 2022;Zhu et al. 2021) have revolutionised regression analysis by providing powerful tools for predicting continuous numerical outcomes.This section will explore several ML regression techniques commonly used in various domains.These techniques include neural networks, gradient boosting, random forest, support vector machine (SVM), linear models, decision trees, and naive Bayes regression models.Moreover, they have been investigated for nanofluid thermal conductivity predictions in this study along with the application of the novel feature selection algorithm proposed by this study.

Neural networks
Neural networks are ML models inspired by the human brain's neural structure.They consist of interconnected layers of artificial neurons that can learn complex patterns and relationships.Neural networks have been successfully applied to regression tasks because they capture nonlinear relationships in the data (Chiniforooshan Esfahani 2023;Genzel et al. 2022;Hornik et al. 1989;Kamsuwan et al. 2023;Kannaiyan et al. 2019;Mijwil 2018;Ekene Jude Onyiriuka 2023a, b;Peng et al. 2020).

Gradient boosting
Gradient boosting is an ensemble learning method that combines multiple weak models, typically decision trees, to create a robust predictive model.It trains new models to correct the errors made by previous models, gradually improving the overall prediction accuracy (Friedman 2001).

Random forest
Random forest is another ensemble learning technique that constructs a collection of decision trees and combines their predictions to make accurate predictions.It reduces overfitting by introducing randomness in treebuilding (Breiman 2001;Gholizadeh et al. 2020;Tan et al. 2022).

Support vector machine (SVM)
SVM is a popular ML algorithm used for regression tasks.It aims to find the best hyperplane that separates the data into different classes while minimising the error in the training instances.SVM can handle linear and nonlinear regression problems (Razavi et al. 2019;Vapnik 1999).

Linear model
Linear regression is a simple and widely used ML technique for regression analysis.It assumes a linear relationship between the input features and the target variable.The goal is to find the best-fit line that minimises the sum of squared differences between the predicted and actual values (Géron 2022).

Decision trees
Decision trees are versatile ML models that make predictions by partitioning the feature space into regions based on simple decision rules.They are interpretable and can capture nonlinear relationships in the data.Decision trees can be used for regression and classification tasks (Breiman et al. 1984).

Naive Bayes regression model
Naive Bayes regression is based on Bayes' theorem and assumes that the input features are conditionally independent given the target variable.Despite its simplicity, naive Bayes can provide reasonable predictions, especially when the independence assumption holds (Rish 2001).
These ML regression techniques offer various options for analysing and predicting continuous variables.The choice of technique depends on the specific problem, dataset characteristics, interpretability requirements, and computational considerations.Researchers and practitioners often compare and combine these techniques to achieve the best performance for their regression tasks (Sharma et al. 2022;Witten et al. 2016;Yashawantha and Vinod 2021;Zhu et al. 2021).

Cross-validation
Cross-validation is a machine learning technique used to estimate the ability of a machine learning model on unknown data.In this process, a small sample is used to assess how the model will perform.These data are called "out-of-bag" data.It is a popular strategy since it is simple and produces a less biased or optimistic estimate of a model's predictive ability than other processes, such as a simple train-test split.Cross-validation ensures a fair comparison of the models (Brownlee 2020a).
To compare machine learning algorithms well, we ensured that each algorithm was evaluated on the same data and in the same way.Cross-validation is a resampling technique for evaluating machine learning models on a small sample of data.Although there are several cross-validation methods, we chose the k-fold cross-validation method because it was the most fitting for this study-for evaluating models without bias.However, researchers Brownlee (2020b) reported an alternative method called stratified cross-validation, which is suitable for cross-validating imbalanced datasets.However, it is only applicable to classification problems.Hence, it does not apply to our study.The process of k-fold crossvalidation includes a parameter, k, which specifies the number of groups into which a given data sample should be sectioned.Fivefold cross-validation refers to cross-validation where the dataset is split into five sections.The cross-validation method used in this study was the fivefold cross-validation method, which is applied to evaluate every algorithm to ensure the same evaluation on all models.
The figure below shows the k-fold algorithm.
The general procedure is shown in Fig. 1: 1. Randomised shuffling of the dataset.2. Divide the dataset into k distinct sections.
3. For each distinct section: i. Use the section as a test data set.ii.Use the remainder of the section as a training data set.iii.Fit each model to the training set (ii) and evaluate it on the test set (i). iv.Save the evaluation score and note the model.4. Steps (i) through (iv) should be repeated for x number of models.5. Report the predictive ability of each model by summarising the model's average evaluation score.

Evaluation metrics
The evaluation metrics are used to measure the performance of the predictive model.Standard metrics for regression tasks include: Root-mean-squared (RMSE) error (Hastie et al. 2009): it is a popular evaluation metric for regression tasks.It is derived from the mean-squared error (MSE) as seen in Eq. 1 by taking the square root of the average of the squared differences between the predicted values and the actual values.The rationale behind using RMSE as an evaluation metric is like that of MSE, but with some additional considerations RMSE, like MSE, provides a measure of the average prediction error.However, by taking the square root of MSE, RMSE is expressed in the same units as the target variable, making it more interpretable and easier to relate to the original scale of the problem.For example, if the target variable represents the temperature of a fluid in o C, RMSE will also be expressed in o C. Like MSE, RMSE also emphasises more significant errors by squaring them.However, by taking the square root of MSE, RMSE balances penalises significant errors and maintaining interpretability.It allows a more intuitive understanding of the typical magnitude of errors in the model's predictions.RMSE enables direct comparison of models or different scenarios, as it provides a scaledependent standard metric.When comparing models, lower RMSE values indicate better performance, indicating that the model's predictions are closer to average.RMSE is related to the standard deviation of the errors.It measures the typical spread or dispersion of the errors around the actual values.Smaller RMSE values suggest a more concentrated distribution of errors, indicating better accuracy and precision in the model's predictions.
RSQUARED (Gareth et al. 2013) is a widely used evaluation metric for regression tasks.It measures the proportion of the variance in the dependent variable that is predictable from the independent variables as seen in Eqs. ( 2 Mean-squared error (MSE) (Hastie et al. 2009) is a commonly used evaluation metric for regression tasks.It measures the average squared difference between predicted and actual values as shown in Eq. ( 6).MSE measures the average prediction error by considering the squared differences between the predicted and actual values.It penalises more significant errors due to the squaring operation, providing a way to assess the accuracy of the model's predictions.MSE is mathematically convenient and has desirable properties for optimisation.It is differentiable, allowing efficient gradient-based optimisation algorithms commonly used in machine learning.This property makes MSE practical for training regression models using gradient-based optimisation techniques.MSE can be decomposed into the sum of variance and bias terms, known as the bias-variance tradeoff.This decomposition provides insights into the model's performance by assessing the balance between overfitting (low bias, high variance) and underfitting (high bias, low variance).By minimising MSE, the model aims to find the optimal balance between bias and variance for improved generalisation.
Differentiable and Mathematically Convenient: (2) Mean absolute error (MAE) (Gareth et al. 2013) is a commonly used evaluation metric for regression tasks.It measures the average absolute difference between predicted and actual values as seen in Eq. ( 7).
MAE is less sensitive to outliers compared to other error metrics like MSE.Since MAE calculates the absolute difference, it does not heavily penalise significant errors.This makes MAE more robust to outliers, minimising their influence on the overall error measurement.MAE has a straightforward interpretation as the average absolute difference between the predicted and actual values.It represents the average magnitude of the errors, providing an intuitive understanding of the model's performance.MAE is easily understandable by non-technical researchers, making it suitable for communication.MAE is mathematically simple and computationally efficient.It does not involve squaring the differences, simplifying calculations and reducing the computational complexity of evaluating the error metric.

Methods
The procedure to evaluate the predictive model involves the following: The trained model was used to make predictions on the validation dataset.Moreover, the above evaluation metrics were computed using the predicted values and the corresponding actual values from the validation dataset.
The calculated metrics are presented in Table 2.The validation RMSE is used to sort the table from best to worse.

Data exploration and models
The variables were collected for analysis, selecting which features might be necessary for modelling the thermal conductivity of nanofluids.
The nanofluid data set was collected from experimental studies.It consisted of 348 data points.

Conceptualisation and parameter selection
Conceptualisation in this context is formulating a novel parameter selection method for predicting the thermal conductivity of all single-material nanofluids.Parameter selection is the selection of the training data characteristics learned during the learning process.In this study, there is a shift from parameter selection based solely on statistics to selection based on physics and reduction of ( 6) an initially large dimensional space.It describes a feature engineering technique where all variables possible are selected to increase the dimensional space of a dataset by increasing the number of descriptive features.The goal is to find an optimal physical dimensional set of features that make each data point distinct from the others.Even though these extra variables may not have high importance in predicting the target variables, their presence in the model helps to make the predictions unique.This is a different approach from conventional feature engineering.In conventional feature engineering, feature engineering aims to create new variables from existing ones to improve the performance of a machine learning model by providing more information to the model (Patel 2021).
In this case, the new variables are not derived from the existing variables but are other independent variables that further define the characteristics of what is being predicted.
It is important to remember that feature engineering is an iterative process that necessitates a thorough understanding of the problem and the data at hand.

Proposed algorithm for parameter selection
Here we discuss the procedure for selecting parameters according to the novel method discussed above to predict the thermal conductivity of all single-material nanofluids.
(1) Check the problem being solved.
(2) List all the possible features (start with the largest number of features/dimensional space possible).(3) Manually drop features that have no meaning or direct implication to the thermal conductivity of a fluid.For example, using single-material nanofluids: (a) Fluid features-Temperature (b) Multiphase features-Volume fraction and particle size (c) Material features (i) Nanoparticle material: Any two intensive properties will fix the material of the nanoparticle type (Callister 2007;Cengel et al. 2011;Moran et al. 2010).(ii) Base fluid material: Any two intensive properties will fix the material of the base fluid type (Callister 2007;Cengel et al. 2011;Moran et al. 2010).
So, these three feature groupings define a nanofluid.
(4) Apply statistical methods to select features according to 3) out of all other features.(5) At the end of steps (3)-( 5), you should have a reasonable number of features and optimal accuracy.Note that this parameter selection focuses on accuracy and enhanced model learning for generalisation.Accuracy is still of utmost importance.

Other feature selection algorithm
The section presents the minimum redundancy maximum relevance (MRMR) and RReliefF.
• Begin by picking the most relevant feature from a set and add it to a selected set (S). • Check the remaining features (Sc) for those with relevant information but not redundant with the ones in S. • If such features exist, add the most relevant of them to S. • Keep doing this until there are no more non-redundant features left in Sc. • Find the Sc feature with the highest value considering its ability to contribute new information-Mutual Information Quotient (MIQ) while balancing relevance and redundancy.• Add this feature to S and repeat Step 4 as needed.
• Include Sc features with zero relevance into S, randomly.
The algorithm chooses features that are informative and distinct, resulting in an optimised subset for analysis. RReliefF.
Initialise the weights W dy , W dj , W dy∧dj , and W j to 0. The algorithm then follows these steps for a certain number of iterations, denoted by 'updates.' For each iteration 'i' and for a randomly chosen observation x r Find the k-nearest observations to x r .m is the number of iterations specified by 'updates'.

Update the intermediate weights as follows:
The y x r , x q calculates the difference in continuous response y between observations x r and x q normalised by the range of response values: where y r is the response value for observation x r ; y q is the response value for observation x q .d rq is a distance function.
After updating all intermediate weights for each iteration, RReliefF calculates the predictor weights W j using the formula:

Preprocessing of experimental data for training and validation
Preprocessing experimental data for training and validation involve several steps to ensure the data are in a suitable format and quality for ML regression analysis.The following are essential components of data preprocessing for training and validation: The removal of any irrelevant or redundant data did not contribute to the regression task.This includes removing duplicates, handling missing values, and addressing outliers.Missing values can be imputed using techniques such as mean, median, or advanced imputation methods like regression imputation.However, this study did not impute missing values since the data had no missing values.

Feature selection and modelling
In this study, the modelling process is approached from the standpoint of feature selection.
To start the modelling procedure, we first designed it for reproducibility.This was achieved by using a default and consistent random seed generator.The data are then partitioned into two sets in an 80:20 ratio, 80% for training (252 observations) and 20% (69 observations) for later out-of-bag testing.And 10% of the 80% for testing (27 observations).The fivefold cross-validation was carried out to select the model without bias fairly.
The response (percentage enhancement of thermal conductivity-"ENT") was specified.Moreover, the rest of the variable was specified as the predictors.

Data analysis
The total number of data rows collected was 348, with 22 columns including the response variable.
The variables are represented by the following nomenclature for ease of reference, as shown in Table 1: Figure 2 showcases histograms portraying the characteristics of each variable, including the response variable "ENT" response variable.Visual scrutiny of these histograms swiftly indicates that none of the variables conforms to a normal distribution, prompting the requirement for models attuned to handling such nonnormal data distributions.Therefore, a range of modelling methodologies comes into view.Notably, robust linear regression (Maronna et al. 2019) emerges as a promising method, particularly due to its reliable coefficient estimates even in the presence of outliers.Likewise, nonparametric models, including decision trees, random forests, and support vector machines (Kurani et al. 2023), surface as possible modelling options, capable of understanding intricate relationships without making such assumptions about data distribution.Furthermore, ensemble models, represented by boosting and bagging (Mohammed & Kora 2023), also exhibit their strength to enhance predictive accuracy.This proves invaluable when dealing with non-normal data and intricate nonlinear relationships.Additionally, the potency of neural networks (Cong & Zhou 2023) becomes evident, owing to their remarkable capacity for detecting patterns and relationships even amidst complex data representation.It is noteworthy that the author avoids data transformation of the response variable since that could potentially lead to data leakage because data transformation holds the risk of inadvertent data leakage as noted by Osborne (2010) where knowledge from the target variable infiltrates the transformation process, affecting the model outcomes.
In conclusion, Fig. 2 effectively visualises the histogram plots of diverse variables, exposing their departure from normality.Consequently, a suite of modelling options is proposed, encompassing robust linear regression, nonparametric models (e.g., decision trees, random forests, support vector machines), ensemble models (e.g., boosting, bagging), and neural networks.These selections are apt for handling data exhibiting non-normal distributions.The selection among these modelling approaches should be guided by the specific data characteristics.Figure 3b presents the box plot for each variable, offering a visual summary of their distribution characteristics, including skewness, symmetry, and potential outliers.The structure of the box plot is depicted in Fig. 3a.Box plots allow for a concise representation of multiple variables, facilitating the identification of differences in central tendency and variability among them.The box plot provides several important features for each variable.The rectangular box represents the interquartile range (IQR), encompassing the middle 50% of the data.The line within the box represents the median, indicating the central  We can observe similar data characteristics between the variables regarding skewness, making it possible to create some groupings.In contrast, some variables are single and do not fall under similarity groupings.The following groups 1 to 12 show the variables that have similar relationships with themselves by visual examination as observed in the box plot in Fig. 3.

Results analysis
In the appendix, the table presents the results of the model selection process.The neural networks emerged  Additionally, the data were standardised before training the model.To further optimise the neural network, Bayesian optimisation was employed.The optimisation process was guided by the estimated improvement per second plus 30 iterations.The hyperparameter search range included the number of fully connected layers (1-3), the size of the first layer (1-300), the size of the second layer (1-300), the size of the third layer (1-300), and the choice of the activation function (ReLU, tanh, sigmoid, or none).The regularisation strength varied between 3.9683 × 10^(-08) and 396.8254.Data standardisation was considered a binary choice (yes/no).The optimised hyperparameters for the neural network were determined as follows: two fully connected layers with sizes of 64 and 10, respectively.The ReLU activation function was utilised, the regularisation strength was set to 392.6291, and the data were standardised.However, it was observed that the performance of the optimised neural network was not satisfactory.The validation RMSE for the optimised model was 14.472, which was higher than the non-optimised version.The mean-squared error (MSE) was 209.443, the R-squared (RSQUARED) was -2.841, and the mean absolute error (MAE) was 12.482.
For the test set consisting of 27 observations, the MAE was 9.096, the MSE was 125.532, the root-mean-squared error (RMSE) was 11.204, and the RSQUARED was -1.932.This observation is possibly due to overfitting the model parameters to the data and hence the poor performance on the validation set and test set.In order to perform feature selection, a copy of the most accurate model was used.

Practical significance of the developed predictive model
By developing this model, it is possible to study and optimise nanofluids numerically before creating them.It enhances the ability to edit conventional fluids to fit any fluid description of our desire, especially the fluid's thermal properties.Also, it is to be noted that by editing the base fluids by adding nanoparticles, we can obtain numerous fluids (nanofluids as there are permutations of the features of the nanoparticles and base fluids) and adequately model their characteristics.

Conclusions
This research presents a novel approach for modelling single-material nanofluids, considering their constituents, the specific fluid characteristics, and the problems being addressed.The developed approach has demonstrated high accuracy in modelling nanofluids.
The significance of this study lies in its potential to advance our understanding of nanofluid behaviour by examining the individual and combined effects of variables on the thermophysical properties of nanofluids and providing researchers a road map on how to select features for nanofluid modelling so that we can have more general and accurate models.Furthermore, this methodological process for modelling as detailed in this study serves to suggest a process for researchers to apply when modelling nanofluids' thermophysical properties.By delving into these relationships, researchers can gain valuable insights into the underlying mechanisms governing nanofluid behaviour, leading to improved design and optimisation of nanofluid systems.
The ability to accurately model single material nanofluids opens up new possibilities for investigating and resolving the anomalous heat transfer enhancement observed in these fluids.Furthermore, it allows for the customisation of nanofluids to meet desired thermal properties, providing greater control over their application in various fields.
Overall, this research contributes to the growing body of knowledge on nanofluids, offering a promising avenue for further exploration and understanding of their thermophysical properties.The developed modelling approach sets the stage for future studies aimed at harnessing the full potential of nanofluids in enhancing heat transfer and achieving desired thermal characteristics.It is to be noted that the work is purely computational, and hence, researchers can look to validate these claims experimentally.Also applying these to hybrid nanofluid serves as a significant future work.

Fig. 1
Fig.1The fivefold algorithm tendency of the variable.The whiskers extend from the box, indicating the data range within 1.5 times the IQR.Values beyond the whiskers are considered potential outliers and are displayed as individual data points.By examining the box plots in Fig.3b, we can observe the distributional characteristics of each variable.Skewness can be observed by considering the asymmetry of the box and whiskers.The whisker lengths are significantly different, suggesting unequal variability.Outliers are visually identifiable as can be observed by data points lying beyond the whiskers.The side-by-side presentation of multiple variables in Fig.3ballows for an easy comparison of their central tendencies and variabilities.Differences in box lengths, medians, and whisker lengths among the variables indicate variations in their distributions.The utilisation of box plots aids in understanding the distributional properties of each variable and enables the identification of potential outliers and variations in central tendency and variability across multiple variables.

Group 1 :
Nanofluid temperature Group 2: Particle size diameter, Nanoparticle density, Nanoparticle thermal conductivity Group 3: Volume fraction, Base fluid surface tension Group 4: Nanoparticle thermal diffusivity, Base fluid dielectric constant Group 5: Nanoparticle-specific heat capacity, Nanoparticle electrical conductivity Group 6: Nanoparticle melting point, Base fluid specific heat capacity Group 7: Nanoparticle dielectric constant, Base fluid density Group 8: Nanoparticle refractive index, Base fluid boiling point Group 9: Nanoparticle magnetic susceptibility Group 10: Base fluid thermal conductivity, Base fluid thermal diffusivity Group 11: Base fluid viscosity Group 12: Base fluid kinematic viscosity, Percentage enhancement of nanofluid thermal conductivity Groups 1, 9, and 11 are very different from the rest.

Fig. 2
Fig. 2 A histogram plot of each variable

Fig. 3 a
Fig. 3 a Box plot anatomy (MathWorks 2022).b Box plot of the variables

Table 1
Common feature selection algorithm