Skip to main content

Modeling nanofluid viscosity: comparing models and optimizing feature selection—a novel approach



The accurate prediction of viscosity in nanofluids is essential for comprehending their flow behavior and enhancing their effectiveness in different industries. This research delves into modeling the viscosity of nanofluids and assessing various models through cross-validation techniques. The models are compared based on the root mean square error of the cross-validation sets, which served as the selection criteria.

The main body of the abstract

Four feature selection algorithms namely the minimum redundancy maximum relevance, F-test, RReliefF were evaluated to identify the most influential features for viscosity prediction. The feature selection based on physical meaning was the algorithm that yielded the best results, as outlined in this study. This methodology takes into account the physical relevance of most aspects of the nanofluid's viscosity. To assess the predictive performance of the models, a cross-validation process was conducted, which provided a robust evaluation. The root mean squared error of the validation sets was used to compare the models. This rigorous evaluation identified the most accurate and reliable model for predicting nanofluid viscosity.


The results showed that the novel feature selection algorithm outclassed the established approaches in predicting the viscosity of single material nanofluid. The proposed feature selection algorithm had a root mean squared error of 0.022 and an r squared value of 0.9941 for the validation set, while for the test set, the root mean squared error was 0.0146, the mean squared error was 0.0157, the r squared value was 0.9924.


This research provides valuable insights into nanofluid viscosity and offers guidance on choosing the most suitable features for viscosity modeling. The study also highlights the importance of using physical meaning to select features and cross-validation to assess model performance. The models developed in this study can be helpful in predicting nanofluid viscosity and optimizing their use in different industrial processes.


Predicting viscosity in nanofluids plays a crucial role in understanding their flow behavior and optimizing their applications in various industries (Bhaumik et al. 2023; Chiniforooshan Esfahani 2023; Esfe and Arani 2018; Gholizadeh et al. 2020; Onyiriuka 2023b; Said et al. 2021; Tan et al. 2022; Yadav et al. 2020). Nanofluids, suspensions of nanoparticles in base fluids, exhibit unique rheological properties that differ from those of conventional fluids (Tan et al. 2022). Accurate prediction of the viscosity of nanofluids is essential for the efficient design and optimization of heat transfer systems, lubrication processes, and other applications.

As a critical step in the modeling process, feature selection aims to identify the most influential features contributing to nanofluids’ viscosity. It involves selecting relevant input variables or features from potential predictors. This study focuses on the feature selection process for predicting the viscosity of single material nanofluids. Single material nanofluids consist of nanoparticles and base fluid that are stably mixed. A nanofluid viscosity model provides a unique system for investigating the impact of various parameters on viscosity. By carefully selecting the appropriate features, we can uncover the underlying relationships between the composition, particle size, temperature, other factors, and the resulting viscosity of nanofluids.

The objective of this study is to investigate various feature selection methods and pinpoint the primary factors that have a significant impact on the viscosity of single material nanofluids. By utilizing physical, sophisticated statistical, and machine learning techniques, the goal is to create precise prediction models that can estimate the viscosity of nanofluids based on a chosen set of input features.

The findings of this study will contribute to a deeper understanding of the factors that govern the viscosity of nanofluids and provide valuable insights for optimizing their performance in practical applications. Moreover, the developed feature selection techniques can be applied to other nanofluid systems, enabling efficient and effective viscosity prediction models for various nanofluid applications.

Various researchers have studied this subject extensively but mainly focusing on its accuracy than its generality and conventional feature selection. Gholizadeh et al. (2020) in 2020, a group of researchers—Gholizadeh, Jamei, Ahmadianfar, and Pourrajab—conducted a study on predicting the viscosity of nanofluids using the Random Forest (RF) approach. What was unique about their research is that they utilized the RF method to estimate the thermophysical property of nanofluids for the very first time. The study focused on five significant parameters, which included volume fraction, nanoparticle size, nanoparticle density, and base fluid viscosity.

The researchers used various statistical tools to compare different correlations and found that their model was the best, with an R2 of 0.9972. The next best was Nguyen’s model with an R2 of 0.654, followed by the Maiga et al. correlation at an R2 of 0.652 (Gholizadeh et al. 2020).

It's worth noting that there was no validation data set mentioned for their case. The researchers also utilized the out-of-bag error rate method to tune the number of trees and predictors of the RF model. Lastly, they applied a performance index to compare different machine learning models accurately. However, the paper did not consider the application of cross-validation in comparing models, Brownlee (2016), states that from a machine learning viewpoint, it is an essential step in model evaluation and comparison.

It was observed from the study that the volume fraction increased viscosity while particle size decreased it. The nanoparticle volume fraction was noticed to have the most significant impact in predicting the viscosity of nanofluids, while the temperature had the least predictive impact (Gholizadeh et al. 2020).

Rudyak and Minakov (2018) stated that a universal formula describing the viscosity coefficient of any nanofluid has yet to be derived. In addition, most measurements of this quantity have mainly led to opposite results. Einstein and other researchers, including the international nanofluid properties benchmark exercise (Buongiorno et al. 2009; Kim et al. 2009; Venerus et al. 2010), thought that the volume fraction was the sole determining factor of nanofluids' viscosity. It has now been shown that the non-universality models are because the volume fraction of the nanoparticles is not the only factor determining nanofluids’ viscosity.

According to a recent study, the size and material of nanoparticles play a significant role in determining the viscosity of nanofluids. As the concentration of particles increases, the viscosity of nanofluids also increases, while an increase in particle size or temperature results in a decrease in viscosity. Additionally, the type of nanoparticle used can lead to a significant difference in viscosity. Nanofluids have been found to have higher viscosity levels than ordinary fluids with coarse dispersion (Rudyak and Minakov 2018).

The viscosity of nanofluids can be estimated using the modified Einstein's quadratic model form for low and moderate concentrations of nanoparticles. However, the coefficients in this equation vary based on the material and size of the particles. Increasing the degree of order in a fluid lead to an increase in effective viscosity, which can be achieved by decreasing the particle size and increasing the particle concentration (Rudyak and Minakov 2018).

Nanofluids are more ordered than base fluids, and the addition of nanoparticles helps to improve momentum transfer. Molecular dynamics suggest that nanoparticle–molecule interaction is the primary reason for increased viscosity in nanofluids. Einstein's equations do not apply to nanofluids due to assumptions like neglecting interactions between molecules and nanoparticles, creeping flows, or very low particle Reynolds numbers. Therefore, further investigation is needed to understand the relationship between the viscosity of nanofluids and nanoparticle materials, as concluded by the study (Rudyak and Minakov 2018).

Machine learning models

In this study several machine learning models were applied, namely: The Gaussian process regressor, Neural network, support vector machines, decision trees, ensembles, and linear regression.

The Gaussian process regressor uses probability distributions to model relationships between variables. The neural network learns complex patterns of data through layers of interconnected nodes. Support vector machines finds a hyperplane that separates data into classes. Decision trees divides data into subsets based on feature threshold. Ensemble models combine multiple models to improve predictive accuracy and robustness. The linear regression establishes a linear relationship between features and target (Mahesh 2020; Sarker 2021).

Data collection and analysis

The data were collected from open literature: ZnO—ethylene glycol (Lee et al. 2012), TiO2—water, Mg(OH)2—ethylene glycol (Esfandiary et al. 2016), Al2O3—water (Nguyen et al. 2008), SiO2—water (Tavman et al. 2008), CuO—water (Pastoriza-Gallego et al. 2011), CuO—ethylene glycol (Yadav et al. 2020), Al2O3–water (Pastoriza-Gallego et al. 2009), Al2O3—ethylene glycol (Yadav et al. 2020), CeO2—ethylene glycol (Yadav et al. 2020). The total number of data rows collected was 245, with 20 columns including the response variable. There were no missing data points in the data set; hence, the study did not need to impute missing values or drop incomplete rows of data.

The variables are represented by the following nomenclature for ease of reference, as shown in Table 1.

Table 1 Variables nomenclature for ease of reference (Onyiriuka 2023a)

In the provided Fig. 1a, we can observe the distribution of each variable. However, there seems to be no normality in general for any of the variables. Each plot in the figure represents a histogram plot displaying the range of values for each feature.

Fig. 1
figure 1

a A histogram plot of each feature. b The normal probability plot

For instance, the temperature values are plotted on the x-axis, while the frequency of each temperature value is represented on the y-axis. The first plot in Fig. 1a shows the temperature values, where the most frequently occurring temperature value is 50 °C. On the other hand, the least occurring temperature value of 70 °C was also the highest temperature value. The temperature values between 35 and 45 °C were the most frequently occurring groups in the data set. The general trend in the data shows a rise in the beginning and a fall toward the end. Similar analysis can be seen for the other features. This property is also illustrated clearly in the standard probability plot in Fig. 1b.

In Fig. 1b, we can see a normal probability plot that compares the distribution of data in each variable to the standard normal distribution. The plot uses plus sign markers (' + ') to represent each data point in each variable. Two reference lines are drawn to show the theoretical normal distribution. The first reference line is a solid line that connects the data's first and third quartiles, while the second is a dashed line that extends the solid line to the ends of the data range. If the data follows a normal distribution, the points align along the reference line.

However, if the data deviate from the normal distribution, it introduces a curvature or deviation in the plot, indicating that the data distribution differs from the expected normal distribution (MathWorks 2022). By visually inspecting the standard probability plot in Fig. 1b, we can observe the departure from normality and the nature of the data distribution.

Figure 2 shows the box plot of each variable.

Fig. 2
figure 2

A box plot of each feature

Using a five-number summary, box plots are a common method for displaying data distribution. The temperature data's box plot shows the minimum, first quartile, median, third quartile, and maximum values of temperature. The five components make up the box plots, providing information about the temperature distribution for instance. These components include the median, hinges (Q1 and Q3 quartiles), fences (adjacent extremes), whiskers (minimum and maximum values, excluding outliers), and outliers (data points outside the whiskers).

Notched box plots, narrow the box around the median to provide an approximate 95% confidence interval for the population's median. Notches are particularly useful for evaluating the significance of differences between medians. In Fig. 2, it was observed that notches of the temperature values and the particle size overlap signifying the similar median distribution. The height of the notches is proportional to the interquartile range (IQR) of the sample and inversely proportional to the square root of the sample size. By analyzing the plot, it is evident that each variable has distinct values except for the thermal conductivity, thermal diffusivity, specific heat capacity, surface tension, and dielectric constant of the base fluid, which are similar but opposite to the density, viscosity, kinematic viscosity, and boiling point of the base fluid. To model the viscosity of nanofluids, it is recommended to explore decision trees, ensemble models, and neural networks.


This section tests various modeling and feature selection algorithms, including the algorithm outlined below in Sect. "Algorithm for parameter selection applied for viscosity" [Novel Feature selection algorithms (NFSA)]. The other investigated feature selection algorithms include minimum redundancy, maximum relevance (MRMR), FTest, and RReliefF. Tables 2 and 3 summarize the results obtained by applying these algorithms.

Table 2 Models performance and comparison
Table 3 Model parameters and the optimized Gaussian process model

The Minimum Redundancy Maximum Relevance (MRMR)

The MRMR algorithm is a technique used in machine learning and data mining to select a subset of features from a larger set. The main objective of this algorithm is to maximize the relevance of the chosen features to the target variable while minimizing redundancy among them. Here is how the MRMR algorithm works (ÇALIŞKAN 2023; Sakthivel et al. 2023; TM & VENI 2023):

First, start with an empty set of selected features. Then, calculate the relevance of each feature by using different metrics such as mutual information, correlation coefficient, or information gain, with respect to the target variable. Next, select the feature with the highest relevance and add it to the selected feature set. After that, for every remaining feature, calculate its redundancy with respect to the already selected features. Redundancy is a measure of how much information a feature provides beyond what is already captured by the selected features (TM & VENI 2023).

Calculate the MRMR score for each feature by subtracting its redundancy from its relevance. Choose the feature with the highest MRMR score and add it to the selected feature set. Repeat the steps until the desired number of features is selected or a stopping criterion is met (for example a predefined threshold for MRMR score). The final selected features are those in the selected feature set. The MRMR algorithm aims to balance between informative features (high relevance) and avoiding redundant information. By using this approach, the algorithm can help improve the efficiency and interpretability of machine learning models by reducing the dimensionality of the input feature space while retaining the most relevant information (TM & VENI 2023).


The F-test algorithm is a statistical technique that identifies the features with the most relevance or discriminatory power for a given target variable (Mathew 2023; Venkatesan 2023). For each feature in the dataset, the F-statistic is calculated to determine the ratio of between-class variability to within-class variability. The corresponding p value is computed to represent the likelihood of obtaining the observed F-statistic by chance. The features are then sorted based on their F-statistic or p value in ascending or descending order. The top-k features with the highest F-statistic or lowest p value are selected as the final feature subset (Mathew 2023; Venkatesan 2023).

By examining the variability between different classes and within each class, the F-test algorithm assesses the relationship between each feature and the target variable. Features with higher F-statistics or lower p values indicate stronger associations with the target variable. The F-test algorithm aids in identifying the most relevant features for a given classification or regression task by selecting the features with the highest discriminatory power (Mathew 2023; Venkatesan 2023).


The RReliefF algorithm is a technique for selecting features that can effectively differentiate between instances of different classes (Aggarwal et al. 2023). It assigns weights to each feature based on its discriminatory power. The weights are updated iteratively and aggregated across all instances to identify the most relevant features for classification tasks. The selected features are those with the highest scores, indicating their importance in separating instances of different classes (Aggarwal et al. 2023).

To begin, the weights for each instance are initialized to zero. For each instance in the dataset, the weight updates are calculated by considering the differences between the feature values of the current instance and its closest instances of the same and different classes. The weights are then updated accordingly, with greater emphasis placed on features that contribute more to distinguishing between instances of different classes. The feature scores are calculated by aggregating the weight updates across all instances. Finally, the top-k features with the highest scores are selected as the final feature subset (Aggarwal et al. 2023).

Algorithm for parameter selection applied for viscosity

Here we discuss the procedure for selecting parameters according to the novel method discussed by (Onyiriuka 2023a) for predicting the viscosity of single material nanofluids.

  1. (1)

    Check the problem being solved.

  2. (2)

    List all the possible features.

  3. (3)

    Drop features that have no meaning or direct implication to the viscosity of a fluid. For example, using single material nanofluids:

    1. (a)

      Fluid features—Temperature

    2. (b)

      Multiphase features—Volume fraction and particle size

    3. (c)

      Material features

      1. (i)

        Nanoparticle material: Any two intensive properties will fix the material of the nanoparticle type (Callister 2007; Cengel et al. 2011; Moran et al. 2010).

      2. (ii)

        Base fluid material: Any two intensive properties will fix the material of the base fluid type (Callister 2007; Cengel et al. 2011; Moran et al. 2010).

So, these three feature groupings define a nanofluid.

  1. (4)

    Apply statistical methods to select features according to (3) out of all other features.

  2. (5)

    At the end of steps (3)–(5), you should have a reasonable amount of features and optimal accuracy.

Note that the main focus of this parameter selection is not accuracy but enhanced model learning for generalization. Accuracy is still of utmost importance.

Model evaluation methods

The root mean squared error (RMSE) Eq. (1), mean squared error (MSE) (6), mean absolute error equation (MAE) (7), and the Rsquared equation (R2) (2)–(5) were applied in this study to measure model performance. The main decision-making performance evaluation metrics in this study was the root mean squared error. This is applied because of its intuitive and direct interpretation of the error.

$${\text{RMSE}}=\sqrt{\frac{1}{n}\sum_{i=1}^{n}{({h}_{i}-{{h}_{i}}^{\rm pred})}^{2}}$$
$$\overline{h }=\frac{1}{n}\sum_{i=1}^{n}{h}_{i}$$
$${\text{SS}}_{\rm reg}=\sum_{i}{({{h}_{i}}^{\rm pred}-\overline{h })}^{2}$$
$${\text{SS}}_{\rm tot}=\sum_{i}{\left({h}_{i}-\overline{h }\right)}^{2}$$
$${R}^{2}=\frac{{SS}_{\rm reg}}{{SS}_{\rm tot}}$$
$${\text{MSE}}=\frac{1}{n}\sum_{i=1}^{n}{({h}_{i}-{{h}_{i}}^{\rm pred})}^{2}$$
$${\text{MAE}}=\frac{1}{n}\sum_{i=1}^{n}|{h}_{i}-{{h}_{i}}^{\rm pred}|$$


See Tables 2 and 3, Fig. 3.


The results from Table 2 indicate that the "Custom Gaussian Process Regression" model with the preset "Custom Gaussian Process Regression" performs the best in predicting nanofluid viscosity. This model achieved the lowest RMSE on both the validation and test datasets, indicating its superior predictive accuracy. The other Gaussian Process Regression models also showed promising results but were not as accurate as the top-performing model. The Neural Network models demonstrated competitive performance but were not able to outperform the Gaussian Process Regression models. It is possible that further tuning of the Neural Network architectures and hyperparameters could potentially improve their performance.

The Linear Regression models and Tree-based models showed relatively higher RMSE values, suggesting that they might not capture the complex relationships present in the nanofluid viscosity data as effectively as the Gaussian Process Regression and Neural Network models.

Also, Table 2 shows that the best model is obtained by applying the algorithm in Sect. "Algorithm for parameter selection applied for viscosity" [Novel Feature selection algorithms (NFSA)], with 0.0220 roots mean square of the validation data set. Table 3 presents the settings of each of the models and the applied feature selection algorithm. The model settings in Table 3 were obtained from optimizable version of the original model. They are the settings that give the best results when they are optimized with the Bayesian optimizer class. Table 3 provides insight into the hyperparameters, and feature selection algorithms applied to the Gaussian Process Regression models.

The model with the "None" feature selection algorithm performed well, suggesting that all features can be used essentially for good predictions. However, the "FTest" and "MRMR" feature selection algorithms also showed competitive performance, indicating that they effectively identified relevant features for the nanofluid viscosity prediction.

The predictions of response plot of the best model, is shown in Fig. 3. The dots points represent the difference between the predicted response and the true response. A perfect scenario is represented by the line that goes through the origin, indicating that the predicted response and the true response are the same. The vertical distance between the line and any point is the error of the prediction for that point. A good model has small errors, meaning that the predictions are more concentrated near the line.

Fig. 3
figure 3

a A plot of predictions of response in the training data by the accepted model. b A plot of predictions of response in the test data by the accepted model

The plots in Fig. 3a and 3b visually illustrate the quality of predictions made by the accepted model on the training and test datasets, respectively. The close alignment between the predicted responses and the true responses indicates the model's ability to generalize well to unseen data and its overall reliability.

Figures 3a and 3b also demonstrate that the accepted model fits both the training and testing data groups. It is essential to note that the models did not have knowledge of the test data during the training process.

It is important to note that the models were not applied to other scenarios due to the marked difference in data logging methods and nanofluid preparation and handling methods by different researchers and considering their good performance on the test data. The future work would be to apply it to other nanofluid thermophysical properties like hybrid nanofluids thermophysical properties which introduce new features that may be important in predicting its thermophysical properties.


This study focused on modeling nanofluid viscosity and optimizing feature selection for accurate prediction. Through the comparison of various models using cross-validation techniques, we gained valuable insights into the factors influencing nanofluid viscosity and identified the most influential features. By incorporating physical meaning into the feature selection process, we achieved improved results. The research findings underscore the importance of considering physical relevance when selecting features for nanofluid viscosity prediction.

By prioritizing features that have a direct physical impact on viscosity, we were able to develop more precise and reliable prediction models. This approach not only enhances the accuracy of viscosity estimation but also provides a better understanding of the underlying mechanisms governing nanofluid behavior.

The application of cross-validation techniques further strengthened our evaluation of the models. By assessing the root mean squared error of the cross-validation sets, we obtained robust measures of model performance. This rigorous evaluation allowed us to identify the most accurate and reliable model for predicting nanofluid viscosity.

The insights gained from this research contribute to the broader understanding of nanofluid viscosity and offer guidance for optimizing their use in practical applications. By accurately predicting viscosity, industries can improve the design and efficiency of heat transfer systems, lubrication processes, and other applications involving nanofluids. The optimized feature selection techniques developed in this study can be readily applied to other nanofluid systems, enabling efficient and effective viscosity prediction models across various applications.

It is important to note that the research presented here focused specifically on single material nanofluids.

Further studies could explore the modeling and feature selection techniques for other types of nanofluids, such as multi-material and hybrid nanofluids or those with complex compositions. Additionally, investigating the relationship between nanofluid viscosity and thermal conductivity could provide valuable insights into the overall fluid behavior.

In conclusion, this study contributes to the field of nanofluid viscosity modeling by providing a novel approach to feature selection and model evaluation. The novel feature selection algorithm makes a more comprehensive method for representing the viscosity of nanofluids in such a way as to preserve the generality of the models. The accurate prediction of nanofluid viscosity opens up new possibilities for optimizing their performance in industrial processes, leading to enhanced efficiency and cost-effectiveness. The models developed in this research serve as valuable tools for predicting nanofluid viscosity and driving advancements in nanofluid-based technologies.

Availability of data and materials

The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.



Base fluid thermal diffusivity (m2/s) e+07


Base fluid boiling point (°C)


Base fluid specific heat capacity (J/(kg K))


Base fluid density (kg/m3)


Base fluid dielectric constant (–)


Base fluid thermal conductivity (W/(m K))


Base fluid kinematic viscosity (m2/s) e + 07


Base fluid surface tension (mN/m)


Base fluid viscosity (Pa∙s)


Particle size diameter (nm)


Gaussian process regressor


Mean absolute error


Machine learning


Minimum redundancy maximum relevance


Mean squared error


Novel feature selection algorithm


Nanofluid viscosity (Pa∙s)


Nanoparticle thermal diffusivity (m2/s) e+07


Nanoparticle-specific heat capacity (J/(kg K))


Nanoparticle density (kg/m3)


Nanoparticle dielectric constant (–)


Nanoparticle electrical conductivity (mMS/m)


Nanoparticle thermal conductivity (W/(m K))


Nanoparticle melting point (°C)


Nanoparticle magnetic susceptibility (–)


Nanoparticle refractive index (–)


Rectified linear unit


Root mean squared error


Support vector machine


Nanofluid temperature (°C)


Volume fraction (%)


Download references


The author wishes to thank Dr Jongrae Kim and Professor David Barton for their invaluable insights in the preparation of the paper. The author also wishes to thank the Tertiary Education Trust Fund (TET Fund) for funding the studies and the University of Leeds for providing the right environment.


TET Fund sponsored the studies at the University of Leeds.

Author information

Authors and Affiliations



EJO was the main author and only author and carried out all the work in the study.

Corresponding author

Correspondence to Ekene Onyiriuka.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The author declares that there are no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Onyiriuka, E. Modeling nanofluid viscosity: comparing models and optimizing feature selection—a novel approach. Bull Natl Res Cent 47, 139 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Nanofluids
  • Viscosity prediction
  • Modeling
  • Feature selection
  • Cross-validation
  • Root mean square error