Phylogeny and estimated genetic divergence times of banana cultivars (Musa spp.) from Java Island by maturase K (matK) genes

The identification of banana cultivars genome is needed to provide a valid identity from the accession of bananas which are used as basic data in the management of in situ and ex situ banana conservation as well as for further breeding of banana cultivars. The size of PCR-amplified matK ranged from 844 to 860 bp and showed a high variability. The haplotype diversity was 0.9048 with nine haplotypes. Haplotype distribution map revealed the lineage patterns of banana cultivars from Java. Reconstruction of genetic relationships using the maximum likelihood, maximum parsimony and Bayesian inference algorithms produces tree topologies and classifications that are grouped according to their genomic groups, into three main clades, i.e., AA/AAA, AAB and ABB. Based on the previously derived age constraints and fossil data, we estimate (Musaceae) that genetic divergence times of all samples occurred during the Eocene (95% HPD: 51.9 Mya), Musa acuminata group (AA, AAA, AAB) with Musa balbisiana group (BB and ABB) occurred during the Oligocene (95% HPD: 26 Mya), and the separation on each banana cultivars occurred during the Middle Miocene to Pliocene (95% HPD: 16.5–2.5 Mya). From this study, we conclude that all studied cultivars are closely related according to its genomic groups with high variation. Genetic variation among those cultivars creates nine haplotypes. The development of variety which leads to the formation of different banana cultivars had suggested to be occurred long ago along with human migration and domestication.


Background
Banana cultivars (Musa spp., Family Musaceae) are horticultural commodities which are the fourth most important fruit crop in developing countries (Perrier et al. 2011;Kharadi et al. 2014). Banana cultivars are thought to have originated in Indo-Malesia especially Southeast Asia, which then spread to all tropical and subtropical countries (De Langhe et al. 2009;Häkkinen 2013;Sulistiyaningsih et al. 2014). Therefore, Southeast Asia was considered a central origin of banana (Valmayor et al. 2000;Perrier et al. 2011) since banana cultivars are widely distributed in Southeast Asia, including Indonesia (Ministry of Agriculture 2016). In Indonesia, more than 200 cultivars are well recognized and widely cultivated throughout main islands including Sulawesi, Sumatra, Java and Madura (BPS-Statistic Indonesia 2019). Some regions which contributed to banana production in Indonesia include East Java (29.08%), West Java (16.76%) and Lampung (16.61%) (BPS-Statistic Indonesia 2019).
Two types of banana have been characterized by Linneaus in 1953 and1959, namely plantain (Musa paradisiaca Linn.) and dessert banana (Musa sapientum Linn.) (Valmayor et al. 2000). However, the application of both taxonomical names in Southeast Asia as the center origin of banana generated confusion (Valmayor et al. 2000;Singh et al. 2001). Another problem confronting taxonomists and horticulturists in Southeast Asia is the presence of many banana cultivar names and synonymies cultivars in different languages, dialects and regions (Valmayor et al. 2000;Wahyudi et al. 2020). Therefore, new classification scheme proposed by Simmonds and Shepherd (1955) that approved by a consensus in 1999 may be a breakthrough of this problem (Valmayor et al. 2000). The new taxonomical scheme of bananas or called three-tier system consists of the species name, followed by letter combinations indicating the ploidy and genome groups contributed by their ancestral and followed by local cultivars name (Simmonds and Shepherd 1955;Valmayor et al. 2000).
The three-tier system was adopted from the fact that banana cultivars in Southeast Asia were originated from hybridization between the two ancestors Musa acuminata Colla (contributors of A genome, x = 11) and Musa balbisiana Colla (contributors of B genome, x = 11) (Singh et al. 2001;De Langhe et al. 2009;Li et al. , 2013Häkkinen 2013). Hybridization followed by chromosome restitution has arisen of banana diversity with various ploidy level and genomic combinations such as AAA (Ambon, Berlin, Mas), AAB (Pisang Raja) and ABB (kapok, Ebung) (Sumardi and Wulandari 2010;Hapsari et al. 2015). The determination of the genomic groups of banana cultivars was assessed based on morphological characters (Hapsari et al. 2015). However, morphological approaches are sometimes inaccurate because it was very subjective; the process is time-consuming and can be influenced by environmental factors (Probojati et al. 2019).
The latest approach to identify genome composition and grouping of banana cultivars is by the molecular marker. The molecular approach based on the sequence of DNA is proven to have a higher level of accuracy and efficiency compared to morphological techniques and PCR-based analysis such as PCR-RFLP, random amplified polymorphism DNA (RAPD), amplified fragment length polymorphism (AFLP) and inter simple sequence repeat (ISSR) or microsatellite (Poerba and Ahmad 2010;de Jesus et al. 2013;Lamare and Rao 2015;Sundari et al. 2017;Poerba et al. 2019;Probojati et al. 2019;Wahyudi et al. 2020). Although the PCR-based technique can classify banana cultivars genome, most cultivars' maternal and parental donors are still uncertain (Li et al. 2013). Therefore, identification of banana cultivar genome based on DNA sequence from chloroplast genome such as rbcL, intron trnK, trnl-F and matK (maturase K) (Liu et al. 2010;Bieniek et al. 2015;Wahyudi et al. 2013;Janssens et al. 2016;Nikmah et al. 2016;Udensi et al. 2017) may cope the problem. Chloroplast DNA is structurally stable, non-recombinant, and it is inherited from the maternal (Costion et al. 2011;Yuan et al. 2015;Shekhar et al. 2019).
matK gene is widely used for identification because of its effectiveness, slow mutation rate in plants and a more specific accuracy level than other genes. In addition, the consortium barcode of life (CBOL) has also recommended matK gene as a marker for general plant identification (CBOL Plant Working Group et al. 2009). matK gene was even succeeded to reconstruct phylogenetic trees and estimated genetic divergence time (Liu et al. 2009;Udensi et al. 2017). To date, no report has been found about the genetic relationship and the estimated time of genetic divergence of bananas cultivars in Java based on the matK gene sequence.
This present study aims to analyze the genetic diversity, relationship and divergence time among local banana cultivars from Java Island based on maturase K (matK) gene sequence. The results of this study are expected to be useful not only ini agriculture but also in conservation and banana breeding activities. Furthermore, the genetic divergences time provides information about the relation to its distribution range and geological history of local banana cultivars in Java Island.

Plant materials
A total of 14 local banana cultivars scattered from Java Island were collected from banana germplasm garden of Yogyakarta, Indonesia (Table 1). Banana cultivars were originated from areas covering ten districts of Java Island ( Fig. 1) and represented four genomic groups, i.e., AA, AAA, AAB and ABB (Table 1). Further, young leaves of banana cultivars were collected and dried with silica gel before be analyzed. In addition, two species from Gen-Bank NCBI were also used as outgroup, i.e., Ensete glaucum and Musella lasiocarpa.

Molecular analysis
Whole genomic DNA isolation was carried out using Promega Wizard ® Genomic DNA Purification Kit, following its manufacturer's instructions. The total genomic DNA was confirmed both quantitatively and qualitatively. Quantitative examination of total DNA concentration was done using AE-Nano 200 Nucleic Acid Analyze version 2.0. The qualitative examination was carried out using electrophoresis separation on 1% agarose gels stained with 2 μg/ml ethidium bromide (Etbr) Page 3 of 13 Probojati et al. Bull Natl Res Cent (2021) 45:33 for 30 min at a voltage of 80 V and then photographed on GelDOC UV-Transilluminator (BioRAD). The estimated length of total genomic DNA was measured using a 1-Kb DNA ladder marker (Thermo Scientific, California, USA). Amplification of matK gene was accomplished using primer pairs of matK-1RKIM-f 5′-ACC CAG TCC ATC TGG AAA TCT TGG TTC-3′ and matK-3FKIM-r 5′-CGT ACA GTA CTT TTG TGT TTA CGA G-3′ designed by K. J. Kim from School of Life Sciences and Biotechnology, Korea University, Seoul, Korea (Kuzmina et al. 2012

Data analysis Genetic diversity and haplotype analysis
The matK gene DNA sequence data were evaluated using ABI sequences Scanner v.10. Then, sequence alignment process was analyzed using Clustal W menu with MEGA X software (Kumar et al. 2018). The aligned sequences were further rechecked with the trimmed using Gblocks 0.91b (Castresana 2000) based on sequence alignment of the taxa Outgroup and default parameters of the program. Genetic diversity and haplotype diversity were analyzed with DnaSP version 5.10.01. Reconstruction of the haplotype distribution map was generated with Haplotype Network 4.6.1.2 (Paradis 2018).

Phylogenetic analysis
Phylogenetic analyses were reconstructed using maximum parsimony (MP), maximum likelihood (ML) and Bayesian inference (BI) approaches. The maximum parsimony (MP) and maximum likelihood (ML) analyses were implemented in PAUP*v4.0b1 software (Swofford 2002), using trees bisection-reconnection (TBR) branch swapping on 1000 bootstrap replicates. The evolutionary models for the ML and BI analyses were estimated using jModeltest 2.0 (Darriba et al. 2012), and a GTR (General Time Reversible) + G (Gamma) + I (Invariant Sites) substitution model was used. The Bayesian inference analysis was performed using MrBayes 3.0b4 software (Huelsenbeck and Ronquist 2001). Parameters for the best model were then applied in an MCMC (Monte Carlo Markov Chain), were run for 10,000,000 generations, and every 1000 generations were sampled. All sample points that occur red before stationarity of negative log likelihood (-lnL) scores were discarded as burn-in period and were retained in the 50% (burn-in = 500) majority rule consensus tree. Furthermore, each output file produced was visualized through FigTree v1.4.3 (Drummond and Rambaut 2007). A phylogeny tree has a support node, nodes with bootstrap value ≥ 70 for MLBS and MPBS, and ≥ 95 for BPP value (Huelsenbeck and Hilis 1993).

Divergence time estimation
Divergence time estimation analyses were conducted if the tree topology was considered credible in explaining the separation between taxa. Based on the matK genes, the divergence times of banana cultivars were estimated using the Monte Carlo Markov Chain (MCMC) method, and each analysis was run for 10 million generations with parameters sampled every 1,000 generations, which was implemented in BEAST (Bayesian Evolutionary Analysis Sampling Trees) v2.4.1 package software (Drummond and Rambaut 2007), under an uncorrelated lognormalrelaxed clock model of rate variation among lineages.

Amplification and DNA sequences of matK genes
Amplification using primers of matK-1RKIM-f and matK-3FKIM-r was successfully carried out on the 14 banana cultivars examined, resulted in 800-900 bp DNA fragments (Fig. 2). Sequencing of this amplicons produced 844-860 bp length fragments (Table 2). Based on the BLAST analysis, all of those fragments were homologues to the Musa acuminata (MF68180.4), Musa balbisiana (KC904686.1) and matK sequences (similarity 73-100%). Nucleotides composition of matK gene in 14 banana cultivars examined was dominated by T(U) and A bases content than C and G ( Table 2). The average composition value of each base was 36.5% T (U) bases, 29.3% in A bases, 15.8% in G bases and 18.3% in C ( Table 2). The A/T bases content value of 14 banana cultivars was 58.4-67.1%, while the value of GC bases content was 32.6-41.6%. These matK sequences produced 962 characters of nucleotide sites. As much as 360 (37.42%) out of 962 characters were identified as conserved region sites (invariable or monomorphic), 468 (48.65%) were variable sites (polymorphic), and 134 (13.93%) were alignment gaps or missing data. Around 210 (21.82%) was singleton variable sites, and 258 (26.81%) has the potential to parsimony informative sites.

Haplotype diversity
Analysis of haplotype diversity using Ensete genus as out-group resulted in 9 haplotypes with high haplotype diversity values (Hd = 1000 < 0.9048 < 0.5000) (Fig. 3). This study revealed that the studied cultivars lineaged from several regions in Java Island, which were Page 5 of 13 Probojati et al. Bull Natl Res Cent (2021) 45:33 interconnected and classified according to their genomics into 3 groups. The haplotype network was formed by outgroup (H9), namely Ensete glaucum, then flows into the haplotype 8 network (H_8) that consists of 4 cultivars, which was genome ABB and BBw. Then, it flows into the haplotype networks H_3, H_6 and H_7, which all consist of bananas AAB-type. Furthermore, H_1, H_2, H_4 and H_5, where all haplotypes consist of diploid AA banana and AAA triploid.
Banana cultivars of AA and AAA genomes were grouped into one haplotype which previously thought to be result hybrid between the gene flow from haplotype 1 (H_1) network and haplotype 2 (H_2), so it becomes haplotype 4 and haplotype 5 which were AAA genomes. This result could support the finding in the previous study by Simmonds and Shepherd (1955) that triploids originated  2,3,4,5,6,7,8,9,10,11,12,13,14  Page 6 of 13 Probojati et al. Bull Natl Res Cent (2021) 45:33 from pollination of haploids from diploid female cells. For example, triploid genome AAA was originally from the combination of diploid AA female cells with haploid A male cells, so that it will produce AAA triploid (M. acuminata).

Phylogeny of banana cultivars from Java Island
In order to analyze the relationships among 14 banana cultivars in this study, phylogenetic analyses were performed by the 3 methods, which has a different bootstrap support value of each branch point/node. Nodes were considered well supported if value maximum likelihood bootstrap support (MLBS) ≥ 70%, maximum parsimony bootstrap support (MPBS) ≥ 70% and Bayesian inference posterior probabilities ≥ 95% (Huelsenbeck and Ronquist 2001). Although some nodes of 3 methods phylogenetic topology exhibited low bootstrap support, the phylogenetic topology this study consisted of three major clades, according to the genome which well supported (MLBS = 100%, MPBS = 100%, BPP = 100%) (Fig. 4).
Clade I consists of AA genome groups (Rejang, Mas, Berlin) and AAA genome groups (Morosebo, Ambon Hong). However, AAB groups (Raja Seribu) were nested in the clade 1 together with AA and AAA groups, but the clades itself had weak categorized bootstrap support (MLBS = 61%, BPP = 55%) and moderate categorized for (MPBS = 74%) (Fig. 4). Although this result of Raja Seribu (AAB) is nested, the AA/AAA group's relationship with the AAB group had tended to separate into two groups based on the haplotype distribution map (Fig. 3). Clade II mainly consists of AAB groups (Triolin, Brentel Warangan, Saba Awu) with polytomy topology and weak support (MLBS = 66%, BPP = 93%) (Fig. 4a, c), and moderate support for MPBS = 77% (Fig. 4b). However, there are also Kojo Santen (AAA group) and Saba Awu (ABB group) nested in this clade. Their relationship had tended into low bootstrap support in its topology separation. In clade II, the triploid genome group (AAA, AAB, ABB) is interconnected and becomes a clade within it. This result was supported by previous studies which likely that the triploid genome has the same paternal and maternal origins. Clade III consists of ABB and BBw group with polytomy topology and strong supported bootstrap for (MLBS = 90%, MPBS = 94%) (Fig. 4a, b) and relatively weak support for BPP = 93% (Fig. 4c). These results suggest that ABB groups (Ebung, Raja Bandung) and BBw (Kluthuk Ijo, Kluthuk Wulung) are closely related, so it is thought to be sister species group (100% identical). ABB and BBw genome groups show the same relationship patterns and belonging to one group with genetic distance Numbers associated with branches are bootstrap percentages of ML and MP higher than 50% and Bayesian posterior probabilities greater than 0.90, respectively. Each cultivar name was followed genome composition of the cultivar which the previously recognized. Ensete glaucum used as outgroup Probojati et al. Bull Natl Res Cent (2021) 45:33 value of 0,000. These findings suggest multiple inter-specific hybridization origins for the B genome in different cultivars of domesticated bananas. Based on this finding, we suggest that M. balbisiana might donate B genome.

Divergence time estimations
The divergence time estimation of 14 studied cultivars based on matK using BEAST maximum clade credibility tree is depicted in Fig. 5. The mean ages and 95% HPD are shown in

Genetic diversity of banana cultivars
To evaluate the germplasm of domesticated bananas, several studies have investigated the genetic diversity of cultivated and wild Musa accessions using various molecular marker, i.e., AFLP, RAPD, PCR-RFLP, ISSR and microsatellite (Babu et al. 2018;Das et al. 2018;Poerba et al. 2019;Probojati et al. 2019;Wahyudi et al. 2020).
Currently, molecular systematics studies in plants based on matK gene have been widely conducted as a potential candidate to determine genetic diversity and perform species identification or plant DNA barcoding (Costion et al. 2011;Hilu et al. 2014;Yuan et al. 2015;Shekhar et al. 2019). This study also unveiled that banana cultivars matK sequences were longer than of Poaceae (Das et al. 2013) and Dipterocarpaceae families matK (Harnelly  Our present study has evaluated the nucleotide diversity of 14 banana cultivars cultivated in various regions of Java Island, Indonesia, by assessing the variable sites (polymorphic) of matK gene. In this study, we have successfully obtained the value of characters nucleotide sites which suggest that the variable sites (polymorphic) are exist. High variations among matK gene sequences in all of the bananas we studied show its high genetic diversity. This high nucleotide diversity in banana cultivars implies that it may have had a historically large population size and did not undergo any severe genetic bottleneck during the domestication process (Li et al. 2013). matK gene in studied banana also showed high parsimony informative sites, where there were changes in at least two types of nucleotides, and two of them appeared with a minimum frequency of two (Hapsari et al 2018). This gene also demonstrated a high average of GC bases content in our banana samples. It is consistent with what had been reported in plant studies including in Musaceae (Christelová et al. 2011) and Poaceae (Bieniek et al. 2015).

The relationship among banana cultivars in Java is evaluated from the haplotype distribution and phylogenetic
Haplotype network construction analysis was used to visualize the genealogical relationships among DNA sequences within a population or at the intraspecific level or to make inference about biogeography and history of populations. Haplotype network considered more informative than conventional phylogenetic trees to display intraspecific DNA sequence variation (Paradis 2018;Hapsari et al. 2020).
Our present study has successfully obtained the haplotype distribution map which showed that the banana cultivars AA-type nested with the AAA genome group. Furthermore, the banana cultivars of AA and AAA groups are directly connected to the banana cultivars AAB group, and AAB group is directly connected to the banana cultivars ABB group. Therefore, it is possible for AAB group bananas to function as intermediate bananas connecting the two genomes B and genome A. These finding further confirmed a previous study by De Langhe et al. (2009) that hybridization between cultiwild populations, accompanying human migration and still partly fertile clones from different origins led to the generation of more sterile diploids AA genome and the more vigorous and nearly sterile triploids (AAA first, then AAB and later on ABB).
In this study, to analyze the phylogenetics relationship we used 3 models, i.e., ML, MP and Bayesian inference, which are largely congruent in the formed tree topologies. The slight difference in the 3 methods is mainly in support at the species level (Fig. 4). This phylogenetic relationship analysis of banana cultivars differentiated each cultivar into 3 clades according to the genome characters. These findings suggest that matK gene sequences can be used as a recommendation for banana cultivars grouping based on its genomes.
There are 2 clades that form a polytomy topology as of also found in the population of Musa troglodytarum based on rbcL (Hiariej et al. 2015). Polytomy topology is the separation of branches that cannot distinguish or separate one species from another. It is probably caused by evolution that occurs simultaneously, thus causing uncertainty in phylogenetic topology (Kuhn et al. 2011). That is supported by previous genetic evolution studies in Poales (Hochbach et al. 2018), which revealed that the matK gene has an evolutionary pattern and tempo that distinguishes it from other genes yet less accurate to distinguish species at low taxa level. Another reason is that it is possible to require specific primers due to the location of matK in the trnK intron and its proximity to psbA which may provide a nearly universal primer for amplification. An effective sorting strategy is needed for matK in the Musaceae group. It is required to conduct further evaluation and analysis for this study.
There is the importance of genetic relationship analysis for genetic conservation of banana cultivars. Protection of banana genetic variation is needed to be done for both in situ and ex situ conservation. If conservation resources are limited, each cultivar which has a similarity and close relationship must be chosen, one of them as a representative. In our study, based on matK, we need to preserve both Saba Awu Banana (ABB) and Kaja Santen (AAA) since these two cultivars possess different types of genomes yet genetically are closely related (Fig. 4).

Divergence times and biogeographical events of banana cultivars in Java Island
The divergence time in this study is very helpful to interpret the temporal evolution of banana cultivars. Our present study provides divergence time estimation for banana cultivars which originated from Java Island. The estimated ages among Musa, Ensete and Musella formed in the early Eocene (Fig. 5); it is resembled to which reported in the previous study (Janssens et al. 2016). Based on the unearthed fossils, the earliest diversification of Musa occurred in northern Indo-Burma during the Late Eocene. It then spread from the north of Indo-Burma toward all area of Southeast Asia (including Malesia) followed by local diversification (Janssens et al. 2016). Based on these reports and the diverse genus distribution, it seems to be reasonable to assume that the genus Page 11 of 13 Probojati et al. Bull Natl Res Cent (2021) 45:33 Musa evolved and diversified in tropical Asia, especially in Southeast Asia region. Geographic distributions of genus Musa containing species of M. acuminata and M. balbisiana groups (Nodes 3, Fig. 5) started during in the Oligocene. Based on the study by Janssens et al. (2016), the diversification of Ingentimusa, Callimusa and Australimusa started to spread from the northern Indo-Burmese region during the Oligocene, whereas the first dispersal of each M. acuminata and M. balbisiana groups in Java Island (Nodes 4) is distributed from the West region now known as Jakarta (including Raja seribu), toward Yogyakarta, Central Java (including Ambon Hong, Morosebo, Mas and Rejang). It occurred in the middle Miocene to late Miocene. At this time, there is a diversification of Eumusa and Rhodochlamys, which was followed by speciation events (Janssens et al. 2016).
Further spreading into and diversification in East Java (including Berlin, Kojo Santen, Brentel Warangan, Saba Awu, Kluthuk Wulung, Ebung and Kluthuk Ijo) also occurred during the Late Miocene. It is different from that originated from Bantul, Central Java, which occurred in Pliocene (Nodes 8 and 9, Fig. 5). Based on this finding, the spreading of those banana cultivars is supposed to be collateral with human migration both in and out, and became isolated to the regions. Domestication due to human selection and migration is one aspect of evolution (Kantar et al. 2017), especially in banana cultivars. However, we realize that it is still challenging to determine the exact cause of the distribution of existing banana cultivars due to the complexity between phylogenetic patterns and geographical distribution.

Conclusion
Analysis of genetic relationships produced trees which grouped the studied cultivars according to their genomic groups with nine haplotypes. This will allow local banana genetic diversity data to be retained and also to provide intellectual property protection. The estimated divergence time of the formation of all samples (Musaceae) supposed to be occurred during the Eocene (95% HPD: 51.9 Mya), Musa acuminata group (AA, AAA, AAB) with Musa balbisiana group (BB and ABB) occurred during the Oligocene (95% HPD: 26 Mya), and the separation on each banana cultivars occurred during the Middle Miocene to Pliocene (95% HPD: 16.5-2.5 Mya). This study recommends that an analysis of estimated divergence times requires more calibration data by comparing several taxa to determine more specific divergence times. Furthermore, genetic conservation for all 14 studied cultivars from Java is strongly needed to preserve its genetic resources.