Integrated selection criteria in sugarcane breeding programs using discriminant function analysis

Background: Selection indices help the plant breeders to discriminate desirable genotypes on the basis of phenotypic performance. Therefore, the present study was conducted to evaluate thirty sugarcane genotypes (clones) along with two check cultivars in two cropping seasons at Mattana Agricultural Research Station. Results: The results showed the studied traits observed in all genotypes were significantly different. The results could significantly discriminate between low and high sugar yield genotypes by describing eleven traits including sugar yield (ton/fed), cane yield (ton/fed), number of stalk/m, stalk weight (kg), stalk height (cm), stalk diameter (cm), number of internodes, Brix %, sucrose %, purity %, and sugar recovery %. High sugar yield genotypes were selected by discriminant analysis. The discriminant score (DS) could explain 79.2% of sugar yield variations and had a significant canonical correlation (0.89**). Results of discriminant function analysis (DFA) indicated that the most important traits, in order of appearance, are stalk weight, stalk height, purity %, Brix%, and cane yields. Conclusions: Genotypes, G.2017-43, G.2017-42, G.2017-29, G.2017-33, and G.2017-44, showed the highest values of the discriminant score and were recognized as the highest yielder sugarcane genotypes. While the genotypes named Vis, G.2017-30, G.2017-10, G.2017-27, G.2017-25, G.2017-70, G.2017-41, G.2017-40, G.2017-35, and G.2017-58, recognized as the lowest yielder sugarcane genotypes which represent the lowest values of the discriminant score.


Background
Sugarcane (Saccharum sp. hybrids) is one of the main crops in the world and the major producer of sugar and ethanol (Silva et al., 2016). Sugarcane is the world's most-produced crop (total production) and ranks among the ten most widely grown crops worldwide. The total global production of sugarcane in 2016-2017 was 1.9 billion tons, and it was grown in approximately 100 countries, covering an area of~26 million hectares (FAOSTAT, 2018). Early-stage of sugarcane breeding is commonly correlated with low accuracy and selection efficiency due to the large genotypes also to environmental interactions. In general, these methods used to predict the genotypic and breeding values provide paths for more accurate breeding selection (Piepho et al., 2008). Seema et al. (2020) mentioned that sugarcane breeding programs have sought to improve new analytical methodologies to optimize the process of obtaining and selecting superior genotypes, in order to develop genetic materials with high yield and expressing agronomic traits of interest. The use of selection indices is an alternative method recommended by breeders. The selection index is one such method of selecting plants for crop improvement based on several characters of importance. This method was proposed by Smith (1937) using a discriminant function of Fisher (1936).
Also, Smith (1937) suggested that a better way of exploiting genetic correlation with several traits having high heritability is to construct an index, called selection index, which combines information on all the characters associated with the dependent variable like yield. Thus selection index refers to a linear combination of characters associated with yield. The best-known selection indices involve discriminant functions based on the relative economic importance of various characters. The discriminant function analysis measures the efficiency of various character combinations in selection. The selection index leads to simultaneous manipulation of several characters for genetic improvement of economic yield. This technique provides information on yield components and thus aids in indirect selection for genetic improvement.
The discriminant analysis provides an equation that gives maximum separation of high-and low-yield genotypes (Abdolshahi et al., 2015). The linear discriminant analysis can be used not only to examine multivariate differences between groups but also to determine which variables are the most useful for discriminating between groups and also whether one subclass of variables works as well as another and which groups are similar and which are different (Hadavani et al., 2018;Patel and Raval, 2018).
The aim of the present study was to develop a selection index approach that considers the information of several sugarcane traits using the discriminant analysis to better understand the relationship between the traits and sugar yield and find a rank to select superior genotypes in sugarcane breeding.

Experimental design and plant materials
This study was conducted at Mattana Agricultural Research Station (latitude of 25°17′ N and longitude of 32°3 3′), Luxor Governorate. The climate of Luxor is classified by the Köppen-Geiger system as desert, where rainfalls were about 2 mm/year, with a summer mean temperature of 32.4°C and winter mean temperature of 23.2°C and relative humidity of 61.6% during 2017-2018 and 2018-2019. Plant materials composed 30 genotypes of sugarcane, which were tested along with two check commercial cultivars (GT-54-9 and Ph8013) ( Table 1). Genotypes were grown in a randomized complete block design with three replications. The plot area was 15 m 2 , including 3 rows of sugarcane of 5 m long, spaced at 1.0 m. Planting was done during March 2017 by fifteen 3-budded cane pieces in each row. The field was irrigated right after planting and all other agronomic practices were carried out as recommended. Some physical and chemical properties of representative soil samples of the experimental site before sowing for 2017 and 2018 seasons are shown in Table 2. Plant cane was allowed to ratoon after harvest. Both plant cane and its first ratoon crops were harvested at the age of 12 months. At harvest, a sample of ten stalks from each plot was collected to determine the following traits: 1. Number of stalks per m 2 (NSm −2 ) 2. Stalk weight (kg) (SW) 3. Stalk height (cm), which was measured from soil surface to the visible dewlap (SH) 4. Stalk diameter (cm), which was measured at the middle part of stalk (SD) 5. Number of internode (NI) 6. Brix (total soluble solids %), which was determined using a hydrometer (Br) 7. Sucrose percentage, which was determined using automatic saccharimeter, according to A.O.A.C.

Statistical analysis
Regular analysis of variance of randomized complete block design (RCBD) and combined analyses of variance of collected data were run as outlined by Gomez and Gomez (1984) who mentioned that the combined analysis can be applied if the coefficient of variation (CV %) for the individual experiments was lower than 20%. Simple correlation coefficients between various pairs of the studied characters were computed according to Gomez and Gomez (1984). Simple correlation coefficient analysis was automated using R studio statistical software version (3.6.1.) The selection index developed by Smith (1937) using the discriminate function approach (Fisher, 1936) was used to discriminate the genotypes based on all the characters.
Among 30 sugarcane genotypes and two check cultivars (Table 1), the highest 16 sugar yield genotypes and the rest 16 low sugar yield genotypes were selected as group one and group two based on average sugar yield over 2 years that could differentiate groups and then discriminant function analysis (DFA) was performed using SPSS software version 14 (Table 6).
Discriminant function analysis (DFA) provides an equation that gives maximum separation or discrimination between two groups of genotypes. All trait values were standardized before running the discriminant analysis. Also, the main terms related to DFA in our study were as follows: 1-Independent variables (eleven measured traits): these are the discriminating variables or "Predictors" 2-Dependent variable (two groups of 30 genotypes and two checks): this is the grouping variable, which is the object of classification efforts 3-Discriminant function: it is a latent variable, which is created as a linear combination of discriminating (independent) variables For the purpose of this study, the DFA was used to determine which traits (independent variables) discriminate between two groups of genotypes. In simple words, the discriminant function can be thought of as a multiple regression equation. Accordingly, the latent variables which are created as a linear combination of discriminating (independent) variables would be as follows: where D 2 = discriminate function or the predicted score (discriminant score), a is an intercept, b 1 through b n are the discriminant coefficients (analogous to regression coefficients), and X 1 through X n are discriminating variables.
The contribution of each variable to the discrimination between groups is determined by the standardized discriminant coefficients (b 1 to b n ) for each variable in each discriminant function. The larger coefficient (or the standardized coefficient) indicates greater contribution of the respective variable (groups of genotypes).
The first statistic from the DFA is the Eigen-values of the discriminant functions. In this investigation, we have one discriminant function because we are only using two groups here, namely "group 1" high yielder and "group 2" low yielder, so only one Eigen-values displayed that reflects the importance ratio of the measured traits, which classify cases of the dependent variable (groups). In other words, they reflect the percent of variance explained in this variable, cumulating to 100% for function. The canonical correlation is the multiple associations between the predictor's independent variables (eleven measured traits) and the discriminant function. It provides an index of overall model fit which is interpreted as being the proportion of variance explained (R 2 ).
The second statistic is the Wilks' lambda statistic that is used to test the significance of the discriminant function as a whole. The value of Wilks' lambda ranges between 0 and 1, when Wilks' lambda value closes to be 0 and significant; it means that the DFA has goodness of fit to differentiate the genotypes in two groups and vice versa. Therefore, it tells us the variance of the dependent variable (two groups of 30 genotypes and two check cultivars) that is not explained by the discriminant function.
Also, the DFA output includes two important items: the standardized canonical discriminant function coefficients and the structure matrix. The first indicates the relative contribution of each variable to the respective discriminating function. Another way of investigating the relationship between dependent variables (genotypes groups) and discriminant functions is to look at the structure matrix. Finally, we get discriminant scores were a weighted linear combination (sum) of the discriminating variables. Based on these discriminant scores, we ranked genotypes in our investigation (selection index).

Results
The results proved that the coefficient of variation (CV %) for the individual experiments was lower than 20% that permits to apply a combined analysis as supposed by Gomez and Gomez (1984).

Mean performance
The mean values of sugar yield and its related characters for the thirty sugarcane genotypes along with two check cultivars in the plant cane, first ratoon crops, and across the two seasons are given in Tables 3, 4, and 5. Results revealed the presence of significant differences among genotypes (clones), seasons, and their interaction for all studied characters except for the seasonal effect for the number of internode per plant and sugar yield (tons per fed); the interaction between seasons and genotypes was insignificant for stalk diameter, stalk weight, cane yield, and sugar yield. When the interaction effect between genotype and season was insignificant, it means that the sugarcane genotypes had similar behavior in the two seasons. Therefore, it is enough to discuss the combined averages across the two seasons. The coefficient of variation (CV %) values for all studied characters was laid out in the statistically acceptable range (less than 20). Results showed that the first season had higher mean values for all studied traits compared to the second season, except for the number of stacks m −2 and cane yield representing seasonal differences.
Means listed in Tables 3, 4, and 5 indicated that the number of stalk per m 2 is an important character toward the cane yield. Results indicated that a number of stalk per m 2 (as an average of the two seasons) ranged from 14.33 for G.2017-16 to 27.83 stalks m −2 for G.2017-59. It is clear that clones G.2017-25 and G.2017-59 gave the maximum number of stalks/m 2 recording 27 and 27.83, respectively, without significant differences with the used check cultivars. Results indicated that the maximum number of stalk/m 2 was produced by genotypes (G.2017-25, G.2017-42, and G.2017-59)  The mean average of stalk diameter for all tested genotypes did not surpass the check cultivars in the 1st and 2nd seasons and across two seasons. Meanwhile, the highest stalk diameter was obtained by check cultivar (G.T.54-9) recording 2.87, 2.77, and 2.82 cm, respectively, with a non-significant interaction effect between seasons and genotypes.
Regarding the number of internodes per stalk (Tables 3, 4, and 5), there are significant differences among sugarcane genotypes. Also, it ranged from the lowest values that were recorded by G. 2017-59 (13.33, 13, and 13.17) to the highest values (22.67, 23.33, and 23) recorded by G.2017-10 in the 1st and 2nd seasons and across both seasons with significant differences with other sugarcane genotypes.
Concerning to the sucrose % content (Tables 3, 4, and 5), the results have appeared that genotypes (G.2017-29 and G.2017-33) and check cultivar (G.T.54-9) had the     Cane and sugar yields are the final expressions of the most physiological processes, which have interacted with the weather and environment during growth. Variation  in stalk weight, cane yield, and sugar yield among studied sugarcane genotypes was relatively high as shown in Tables 3, 4, and 5. Results indicating that the check cultivars of G.T.54-9 and Ph.8013 gave the maximum values of stalk weight, and cane and sugar yields (tons fed-1) in the 1st and 2nd seasons and across the two seasons with significant differences with the other genotypes. It is noted that G.T.54-9 reflected the maximum cane and sugar yields (54.30 and 5.65 ton fed-1, respectively) across the two seasons with significant differences than Ph.8013 that recorded 46.44 and 4.51 ton fed-1, respectively, while no significant difference was found between G.T.54-9 and Ph.8013 concerning stalk weight recording 1.12 and 1.13 kg, respectively, across both seasons.

Simple correlation coefficient
The coefficients of correlation between all pairs of the studied traits were computed and graphically illustrated in Fig. 1. It is obvious that the data distribution of each variable is shown on the matrix diagonal. The bi-variant scatter plots with a fitted line between all studied traits are displayed below diagonal while the value of the correlation plus the significance level as stars was shown above diagonal. Data in Fig. 1 showed that sugar yield (SY) was positively and highly significantly associated with the number of stalk/m 2 (r = 0.77**), stalk weight (r = 0.84**), stalk height (r = 0.53**), purity % (r = 0.47**), sugar recovery % (r = 0.58**), and cane yield (r = 0.92**). With respect to cane yield, it was positively and highly significantly correlated with the number of stalks per m 2 (0.84**), stalk weight (0.77**), and stalk height (0.53**). There was positively and highly significant correlation between sugar recovery % and each of the stalk weight (0.53**), stalk diameter (0.57**), sucrose % (0.94**), and juice purity % (0.48**). Concerning sucrose %, it was positively and highly significantly associated with stalk weight (0.48**), stalk diameter (0.59**), and Brix (0.66**).
There was also a significant and negative correlation coefficient between the number of stalks per m 2 and stalk weight (r = − 0.48**) while it was positively correlated with stalk height (0.54**). Highly significant and positive correlation coefficient was obtained between stalk weight and stalk diameter (0.51**).

Discriminant function analysis
Based on average yield over 2 years, the sugarcane genotypes were descendingly ranked for sugar yield and their corresponding traits. The highest 16 yielder genotypes were selected as group one (high yielder genotypes) and the rest of the 16 genotypes as group two (low yielder genotypes). In this approach, the result discriminant analysis was illustrated and shown under the following titles.
The group statistics and tests of equality of group means Table 6 showed the mean values and standard deviation of the studied traits for the two groups and the test of the two group differences using Wilks' lambda where proceeding further with the analysis will not be meaningful if there are no significant group differences. The examination of the group means and standard deviations can be helpful in obtaining a rough idea of variables that may be important. Wilks' lambda is of great analytic importance where the smaller Wilks lambda indicated more importance of the independent variable (measured traits) than to the discriminant function. Wilks' lambda ranges between 0 and 1. Values close to 0 indicate different group means while the values close to 1 indicate that the two group means are not different (equal to 1 indicates all means are the same).
Using the statistic of Wilks' lambda, results obtained that there were significant differences between the two groups for all studied traits except the number of stalk per square meter, stalk diameter, number of internodes, and Brix %, sucrose %, and sugar recovery % suggesting that these may be good discriminators to differentiate the two genotype groups.
Standardized canonical discriminant function coefficient and structure matrix Firstly, data (Table 7) of the standardized canonical discriminant function coefficients (b) are used to create the following highly significant discriminant function model as follows: where DS 2 is the discriminant score, CY cane yield (ton/ fed), SY sugar yield (ton/fed), Pr purity %, Br Brix %, SW stalk weight, SH stalk height, NI number of internode, NSm −2 number of stalk/m 2 , Sc sucrose %, and SD stalk diameter. Discriminant analysis not only describes numerically the general distance between the clones with discriminant score (DS 2 ), but also shows the characters that serve the purpose of distinguishing the cultivars among the studied specification. It is possible to classify the studied cultivars and applications using these characters, which use the coefficients from various canonical distributions. If a coefficient is higher than ± 0.5, that character is defined as a distinguishing factor (Tatsuoka 1971).
The second item is the structure matrix; it is just like factor a loading (0.30) is seen as the cut-off between important and less important variables. The independent Table 6 Comparing between the two supposed groups (high and low yielder genotypes) using a test of equality of group means The canonical correlation measures the degree of association between the predicted values (score) fitted by the discriminant function and independent variables (eleven measured traits). As shown in Table 7, a canonical correlation of 0.89 suggests that the discriminate function model explains 89.2% (0.892 × 100) of the variation among the 32 genotypes. Also, results in Table 7 presented a highly significant small value of Wilks' lambda (0.22).

Discussion
According to data in Tables 3, 4, and 5, the number of stalk is very important in sugarcane as it is directly related to the final millable cane population at harvest. The variation in the production of the number of stalk/ m 2 may be attributed to variation in the genetic behavior of sugarcane genotypes in addition to their interaction with the environmental conditions. Abu-Ellail (2015) indicated the varying response of different sugarcane genotypes for the number of stalks per m 2 , and also, the reduced plant population is due to the poor establishment of plant crops or the infection of pests and diseases were blamed to be responsible for the poor yield reported by Singh and Dey (2002). Genotypes, G.2017Genotypes, G. -43, G.2017Genotypes, G. -42, G.2017Genotypes, G. -29, G.2017Genotypes, G. -33, and G.2017 showed the highest values of cane yield and sugar yield, due to their performance and genetic makeup; moreover, they recorded the highest stalk weight, stalk diameter, and stalk number and also register the highest Brix and sucrose percentages. Abu-Ellail et al. (2018) found that crop cycles had a negative effect on cane and sugar yields; it is important to study the characteristics of sugarcane associated with the best clones to use them as selection criteria in the breeding program. These results are in agreement with those obtained by Masri et al. (2014) and Abu-Ellail et al. (2019) who found significant differences among the tested sugarcane genotypes for cane and sugar yields and other agronomic and physiological characters. While the genotypes named vis., G.2017-30, G.2017-10, G.2017-27, G.2017-25, G.2017-70, G.2017-41, G.2017-40, G.2017-35 and G.2017-58, recognized as lowest yielder sugarcane genotypes, due to significant differences among genotypes for stalk length and diameter, and the interaction with seasons. Ahmed and Obeid (2012) and Milligan et al. (1996) found that stalk diameter has been suggested as being indicative of better cultivars. Also, they reported that the number of millable cane and stalk weight are the most useful and reasonable selection criteria for high cane yield. There are significant differences among genotypes for technological quality traits such as Brix and sucrose % that are commonly used for the selection of genotypes. The selected best clones should display high performance in a series of yield and quality-related traits such as sucrose and sugar recovery percentages. Highly significant genotype effects indicate the existence of differences that can be utilized during selection; apparent sucrose content in the juice and Brix % were the major traits that emerged most suitable for the application of selection, while purity % was unsuitable (Zhou et al., 2012, Azeredo et al., 2017and Silva et al., 2017. Data in Fig. 1 indicated that sugar yield displayed a strong and significant correlation of cane yield followed by stalk weight, number of stalks per m 2 , and stalk height in addition to purity % and sugar recovery %. Kwajaffa and Olaoye (2014), Masri et al. (2014), and Feven et al. (2018) found a significant and positive correlation between sugar yield and previous traits. Due to the importance of sugarcane production to Egypt's economy, improved production is essential. A high correlation for these traits with cane yield suggests that these traits are a better criterion to improve yield.
Discriminant function analysis (DFA) is a better technique in comparison with multiple-regression to improve yield (Farshadfar 2012). It is used to predict group membership, so an examination of whether there are any significant differences between two groups on each of the independent variables (studied traits) was run. The examination of the group means and standard deviations can be helpful in obtaining a rough idea of variables that may be important. Wilks' lambda is of great small (less than or equal 0.30) except for stalk weight, stalk height, purity %, and cane and sugar yields indicating their effectiveness as discriminators between high sugar yield and low sugar yield genotype groups. Mohammed et al. (2019) observed sugar yields have been generally improved by the characters of cane yield, stalk weight, and increased total biomass rather than directly by increasing sugar concentration in stalks.
Wilks' lambda is a measure of how function separates the 32 genotypes into two yielding groups. It ranges between 0 and 1. Results in Table 7 presented a highly significant small value of Wilks' lambda (0.22) which indicates the great distinguishing ability of the function. The Wilks' lambda parameter provides the proportion of total variability that is not explained by the discriminant function model.
Discriminant function technique involves the development of selection criteria on a combination of various characters and aids the breeder in indirect selection for genetic improvement in yield. In plant breeding, the selection index refers to a linear combination of characters associated with yield. The results in this study are in accordance with the result of Muhammad et al. (2014) who reported that the important traits to be considered in increasing sugar yield are cane productivity and stalk crop.
The discriminant score is the predicted values of fitting the discriminant function model. Based on discriminant scores, all genotypes are classified and ranked into two classes being high and low yielding groups (Table 8 and Fig. 2). As discriminant scores in Eq. (2) are calculated based on standardized data, genotypes with a higher discriminant score than zero belong to group 1 (high-yield genotypes) and lower than zero belong to group 2 (low-yield genotype), with cut-off value equals to zero. In this study, discriminant analysis was used as a powerful multivariate method to find an integrated selection criterion using all studied traits not only the yield. These results confirmed the efficiency of the proposed integrated selection criteria.

Conclusion
In the current investigation, the results proved that the coefficient of variation (CV %) for the individual experiments was lower than 20% that permits application combined analysis. Results revealed the presence of significant differences among genotypes (clones), seasons, and their interaction for all studied characters except for the seasonal effect for the number of internodes and sugar yield and season × genotypes interaction effect for stalk diameter, stalk weight, cane yield, and sugar yield was insignificant.
Results showed that the first season had higher mean values for all studied traits compared to the second season, except for stalk diameter and sugar yield representing seasonal differences.
Furthermore, the correlation coefficient presented that cane yield was the major sugar yield contributing factor followed by stalk weight, the number of stalks per m 2 , and stalk height in addition to purity % and sugar recovery %. The discriminant function model explains 79.2% of the variation among the 32 genotypes. Also, a highly significant small value of Wilks' lambda (0.22) indicates the great distinguishing ability of the function by using effective traits concurrently; discriminant scores (DS 2 ) were calculated for all genotypes. This indicator could successfully discriminate between high and low sugar yield genotypes.