Skip to main content

DNA barcoding of different Triticum species

Abstract

Background

The genus Triticum L. includes diploid, tetraploid, and hexaploid species. DNA barcoding is a new method to identify plant taxa by using short sequences of DNA and within a short time. In this investigation, we determined a phylogenetic analysis of 20 different Triticum species by partial chloroplast Maturase encoding gene (matK).

Materials and methods

Twenty accessions of different Triticum species diploid, tetraploid, and hexaploid were obtained from different countries. Genomic DNA was isolated from young leaves of studied samples and then used as a template for PCR reaction. PCR products were checked by electrophoresis, purified, sequenced, and submitted in the GenBank nucleotide sequence database, the nucleotide sequence was translated into an amino acid sequence. The nucleotide and amino acid sequences were aligned with Clustal W multiple sequence alignment programs to obtain the phylogenetic tree depending on two statistical data analysis such as bootstrapping and pairwise distance from both nucleotide and amino acid sequences.

Results

The phylogenetic tree obtained from both nucleotide and amino acid sequences divided the 20 Triticum species into two groups, A and B. Group A represented the diploid Triticum species. Group B was divided into two subgroup, I and II. Subgroup I represented the hexaploid Triticum species and subgroup II represented the tetraploid species.

Conclusion

The matK gene sequence has a critical role in discriminating the closely related Triticum species. So these sequences could be used as a DNA barcode for detecting the evolutionary history of Triticum species.

Introduction

Poaceae is a large family that includes the Triticeae tribe, this tribe has around 400–500 species. Genus Triticum L. exists as a polyploid species such as a diploid 2n = 2x = 14, a tetraploid 2n = 4x = 28, and a hexaploid 2n = 6x = 42 species, many of these species have economic importance as a food crop (Doebley et al. 2006). Triticum species genomes were designated as A, B, D, and G contributes to the genome constitution. Several types of analysis gave critical knowledge about the ancestry of the definite genomes in allopolyploid species (Zhang et al. 2002; Gu et al. 2004). It has been generally accepted that diploid wheat and Aegilops squarrosa L. (=syn. Ae. tauschii) (Goat grass) are donors of the A and D genomes, respectively (McFadden and Sears 1946). Many different species have been reported as the original donor of B and G genomes but it is now largely believed that the progenitor was a member of the Sitopsis section of the genus Aegilops, namely, Ae. bicornis, Ae. longissima, Ae. searsii or, most likely, Ae. Speltoides (Provan et al. 2004). Also Ae. speltoides was considered as the maternal donor of Triticum species (Dizkirici et al. 2016). There is a hypothesis that the B genome of polyploidy wheat is from a polyphyletic origin, i.e., it is a recombined genome derived from two or more diploid.

DNA sequence analysis techniques are considered as a modern approach in studying evolutionary relationship and biodiversity (Stoeckle 2003; Ferri et al. 2009). DNA barcoding technique has an essential role in the identification of species (Hebert et al. 2003) due to the small size of the DNA sequence with a high discriminatory power between the organisms. So, they play an important role in the identification of the plants having a problematic taxonomic identity for the biodiversity investigation and the identification of polymorphic plant species (Ajmal et al. 2014; Skuza et al. 2015).

There are many plant DNA barcodes such as rbcL, matK, trnH-psbA, and ITS (CBOL Plant Working Group 2009; China Plant BOL Group 2011; Li et al. 2015). The group of Consortium for the Barcode of Life (CBOL) recommended using a combination of two chloroplastic barcodes (matK and rbcL) as the standard plant DNA barcode supplemented with an additional barcode as required (CBOL Plant Working Group 2009).

The chloroplastic matK gene region (coding sequence) has a complete size with about 1500 bp that is translated into around 500 amino acid sequences for protein (maturase-like protein). The matK gene is one of the useful regions because it is the most rapidly evolving plastid gene, which provides sufficient information to identify the phylogenetic relationships at the intergeneric level (Young and dePamphilis 2000). MatK gene has a high rate of substitution compared with other genes used in grass systematics, also this gene has a large proportion of variation at the nucleic acid level at first and second codon position, low transition/transversion ratio and is characterized by the presence of mutationally conserved sectors. All these features of the matK gene are useful to determine the relationships of family and species (Liang and Hilu 1996).

The aim of this research is to investigate the genetic relationship among the following 20 different Triticum species: 3 diploid Triticum monococcum L. (einkorn wheat), 11 tetraploid species (one T. dicoccon subsp. dicoccon (emmer), 2 T. turgidum subsp. dicoccoides (wild emmer) and 8 T. turgidum subsp. durum (Desf.) (durum or macaroni wheat)) and 6 hexaploid T. aestivum (common wheat) were collected from different countries by using one type of DNA barcodes like matK gene and its translated amino acid sequence (151 amino acid) that form maturase K like protein.

Materials and methods

Plant materials

Twenty different Triticum species such as a diploid (Triticum monococcum L. AmAm), a tetraploid (Triticum turgidum subsp. dicoccoides, Triticum dicoccon subsp. dicoccon, and Triticum turgidum subsp. durum (Desf.) BBAuAu), and a hexaploid (Triticum aestivum BBAuAuDD) were obtained from International Center for Agricultural Research in the Dry Areas (ICARDA, Aleppo, Syria), Leibniz Institute of Plant Genetics and Crop Plant Research (IPK, Gatersleben, Germany), Agricultural Research Center (ARC, Giza, Egypt) and Egyptian National Gene Bank (Agricultural Research Center, Giza, Egypt) as mentioned in Table 1.

Table 1 The scientific and common name of 20 different Triticum species from a different country with code name and its GenBank accession numbers

Genomic DNA isolation

Genomic DNA was isolated from 100 mg young leaves samples using Gene Jet Plant Genomic DNA purification Mini Kits (Thermo scientific K0791). The extracted DNA was assessed by agarose gel electrophoresis and spectrophotometry (NanoDrop 2000; Thermo Scientific) and diluted to 50 ng/μl, then used as a template for PCR reaction (Golovnina et al. 2007).

MatK primer design

Seven matK gene sequences of different Triticum species were retrievable from the National Center for Biotechnology Information (NCBI) database (GenBank). The used sequences have accession numbers DQ420054.1 (T. monococcum, partial sequence), KC608185.1 (T. monococcum subsp. aegilopoides, partial sequence), KC608186.1 (T. monococcum subsp. aegilopoides, partial sequence), KC608208.1 (T. turgidum subsp. dicoccon, partial sequence), KC608210.1 (T. turgidum subsp. dicoccon, partial sequence), DQ420019.1 (T. aestivum, partial sequence), DQ420050.1 (T. aestivum, partial sequence), and AF164405.1 (T. aestivum, complete sequence). Then, the downloaded matK gene sequences with these accession numbers were saved in fasta files then aligned by mega program version 6. The sequence from base 123 to base 644 bp was commonly present in all aligned sequences with length about 521 bp, this part of the sequence was used for designing the matK primer using online program Primer 3 (version 4) (http://bioinfo.ut.ee/primer 3-0.4.0). The matK primer forward 5′-ACCTGTGGAAATAGTTGTTAGTTGT-3′ and reverse 5′-CCAATTCGAATAGTAGTTGAGAAAG-5′ was designed to amplify 454 bp only from matK sequence. After that, the designed primer was tested in silico by aligning the retrievable complete matK gene sequence of Triticum with the designed matK primer to ensure that this primer was already a specific matK primer and attached with Triticum matK gene by 100%.

PCR amplification

The PCR reaction was carried out in duplicate in a T100™ Thermal Cycler (Bio-Rad) in the final volume of 25 μl. The single PCR reaction mixture contained: 5× Taq Buffer, MgCl, 0.2 mM dNTP, 10 pM of each primer, 50 ng genomic DNA, and 1 U Go Taq DNA Polymerase (Promega, USA). The thermal profile used was 95 °C for 4 min followed by 35 cycles of 95 °C for 30 s, 57 °C for 1 min, and 72 °C for 1 min, and a final extension at 72 °C for 5 min. PCR products were checked by running on 1.5% agarose gel containing ethidium bromide in 1X TAE buffer (pH 8.0). The gel was analyzed and archived using the Molecular Imager® GelDoc™XR software. Bands were scored and analyzed with the Quantity One software (Bio-Rad). The size of the products was determined by comparison with 100 bp DNA Ladder H3 RTU (GeneDirex, cat no. DM003-R500). The sequences isolated in this paper have been deposited in the GenBank nucleotide sequence database (National Center for Biotechnology Information (NCBI)) under accession numbers MN047218, MN047219, MN047220, MN047221, MN047222, MN062364, MN062365, MN062366, MN062367, MN062368, MN062369, MN062370, MN062371, MN062372, MN062373, MN062374, MN062375, MN062376, MN062377, and MN062378 (Table 1).

Phylogenetic analysis

The chromatogram data were visualized by using the Bio-Edit program version 3 (Hall 1999). The nucleotide sequences were aligned with the Clustal W multiple sequence alignment program. The evolutionary history was inferred by using the Maximum Likelihood method based on the Tamura 3-parameter model (Tamura 1992). Statistical support for each constructed tree was provided by two statistical data analysis as bootstrapping (1000 replications) and pairwise distance. Total nucleotide length (bp), estimates of evolutionary divergence between sequences, percentage of nucleotide composition and polymorphism estimation, maximum likelihood of substitution matrix and maximum likelihood of transition/transversion bias were calculated by MEGA 6.06 program (Tamura et al. 2013).

Amino acid sequence

The nucleotide sequences were translated into amino acid sequences by the ExPASy online program (https://web.expasy.org/translate) for each studied Triticum species. The amino acid sequences were aligned with the Clustal W multiple sequence alignment program to construct the phylogenetic tree. The evolutionary history was inferred by using the maximum likelihood method based on the Tamura 3-parameter model (Tamura 1992). Statistical support for each constructed tree was provided by two statistical data analysis as bootstrapping (1000 replications) and pairwise distance.

Results

The selected portion of the matK gene was successfully amplified, then amplicons were sequenced and deposited in the GenBank under accession numbers MN047218, MN047219, MN047220, MN047221, MN047222, MN062364, MN062365, MN062366, MN062367, MN062368, MN062369, MN062370, MN062371, MN062372, MN062373, MN062374, MN062375, MN062376, MN062377, and MN062378 for all studied Triticum species. (Fig. 1 and Table 1). The length of the amplified matK gene was about 454 bp in all studied samples (partial gene) with 189 monomorphic nucleotide positions and 265 polymorphic sites (58.37% polymorphism). The GC% content average was found to be around 35.3% in all tested samples.

Fig. 1
figure1

PCR amplified matK fragments from 20 different Triticum species

The nucleotide sequences of all studied species were analyzed by Tamura (1992) model to estimate the rates of different transitional and transversional substitutions as shown in Table 2. Base substitution mutation is the base of single-nucleotide polymorphism (SNP) which is either involves a transition (pyrimidines/pyrimidines or purines/purines) or transversions (pyrimidines against purines or vice versa) exchange. The estimated transition/transversion bias (R) is 0.99. Substitution pattern and rates were estimated, the nucleotide frequencies are A = 32.34%, T/U = 32.34%, C = 17.66%, and G = 17.66%.

Table 2 Maximum likelihood estimate of substitution matrix

Molecular phylogenetic analysis based on DNA sequence of partial matK gene

The sequence of the chloroplast matK gene was deciphered to verify the phylogenetic relationships of studied Triticum species. The evolutionary history was conducted using the maximum likelihood method depending on the Tamura 3-parameter model by two statistical data analysis bootstrapping and pairwise distance. The two types of data analysis gave the same phylogenetic tree result (Figs. 2 and 3). The phylogenetic tree divided all studied sample (20 Triticum species) into two groups A and B. Group A (green color) represented the diploid Triticum species 2n = 2x = 14 (T. monococcum L. AmAm) with common name Einkorn collected from three different countries Iraq (IG 109083), Iran (IG 113259), and Syrian (IG 44936). This group was split into two subgroups: the first subgroup contained T. monococcum L. from Iran and Syrian while the second sub-group contained T. monococcum L. from Iraq only. Group B was split into two subgroups, I and II. Subgroup I represented the hexaploid Triticum species (red color) 2n = 6x = 42 (T. aestivum (BBAuAuDD)) and subgroup II represented the tetraploid species (blue color) 2n = 4x = 28 (T. turgidium subsp. dicoccoides BBAuAu (Wild emmer), T. dicoccon subsp. dicoccon BBAuAu (emmer) and T. turgidium subsp. durum BBAuAu (macaroni wheat)). It was observed from the subgroup I (red color) that T. aestivum from Indian (TRI 28936) and Libyan (TRI 13955) were closely related to each other while T. aestivum accessions (Egyptian cultivar, sids 4 and Egyptian landraces, Qena, Nag Hamad 27) were different from each other and from T. aestivum accessions collected from Indian and Libyan, also T. aestivum accessions (Egyptian cultivar, Giza 168 and Egyptian landraces, New Valley, Dakhla 7) were different from each other and from all other T. aestivum accessions. The subgroup II (blue color) was divided into two clusters. The first cluster was split into two subclusters, the first subcluster contained T. turgidium subsp. dicoccoides (wild emmer) from Syrian with code number IG 46467 and IG 46447, this indicated that these two species were closely related to each other while the second subcluster contained T. dicoccon subsp. dicoccon (emmer) from Ursprungsland (TRI 28920) only. The second cluster contained T. turgidium subsp. durum (Desf), this cluster was divided into two sub-cluster. The first subcluster contained T. turgidium subsp. durum from Turkey (TRI 28834) and Iran (TRI 19242), these two species were closely related to each other. The second subcluster was divided into two sections, the first section contained T. turgidium subsp. durum from Italian (TRI 27360 and 27284) which were closely related to each other. The second section was divided into two subsections; the first subsection contained T. turgidium subsp. durum from Egypt (TRI 19223) and Egyptian cultivar Sohag 4 while the second subsection contained T. turgidium subsp. durum Egyptian landraces from Sohag, Almonshaah 34 and Sohag, Almonshaah 41.

Fig. 2
figure2

Molecular phylogenetic analysis of different Triticum species by bootstrapping analysis depending on nucleotide sequence of partial matK gene. The phylogenetic analysis was performed in MEGA program version 6 by maximum likelihood method depending on the Tamura 3-parameter model. The bootstrap consensus tree deduced from 1000 replicates is obtained to determine the evolutionary history of the species analyzed. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are excluded. The percentage of replicate trees in which the associated species clustered together in the bootstrap test (1000 replicates) are shown next to the branches (Felsenstein, 1985)

Fig. 3
figure3

Molecular phylogenetic analysis of different Triticum species by pairwise distance analysis depending on nucleotide sequence of partial matK gene. The phylogenetic analysis was performed in MEGA program version 6 by maximum likelihood method depending on the Tamura 3-parameter model. The tree had the highest log likelihood (− 3290.1725). The tree’s percentage in which the associated species clustered together is shown next to the branches. Initial tree(s) for the heuristic search were performed automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances evaluated using the maximum composite likelihood (MCL) approach and then selecting the topology with superior log likelihood value. The tree is drawn to scale with branch lengths measured in the number of substitutions per site

Estimates of evolutionary divergence between sequences

The number of base substitutions per site was estimated between nucleotide sequences (454 bp) of all studied Triticum species as shown in Table 3. All ambiguous positions were removed for each sequence pair. Analyses were performed using the Tamura 3-parameter model with MEGA program version 6. It was observed from Table 3 and Figs. 2 and 3 that the highest evolutionary divergence of studied species was found between T. monococcum L. from Syrian IG 44936 and T. turgidium subsp. dicoccoides from Syrian IG 46447 was 0.37, this indicated that these two species were highly different. While the least evolutionary divergence between T. turgidium subsp. durum (Turkey TRI 28834 and Iran TRI 19242) and also between T. turgidium subsp. durum Egyptian landraces from Sohag, Almonshaah 34 and Sohag, Almonshaah 41 were 0.02, this indicated that every two species with high similarity. The evolutionary divergence between Egyptian Triticum aestivum cultivars (Giza 168 and sids 4) was 0.12 while The evolutionary divergence between Egyptian Triticum aestivum landraces (New Valley, Dakhla 7 and Qena, Nag Hamad 27) was 0.08, this indicated that the difference that was found between both Egyptian cultivars and Egyptian landraces was relatively low.

Table 3 Estimates of evolutionary divergence between sequences of 20 different Triticum species

Estimates of base composition bias difference between sequences

From the analysis of all nucleotide sequences, the difference in base composition bias per site was compute recorded in Table 4 (Kumar and Gadagkar 2001). Even when the substitution patterns are homogeneous among lineages, the compositional distance will correlate with the number of differences between sequences. It was observed from Table 4 and Figs. 2 and 3 that the highest compositional distance found between T. monococcum L. (Iran, IG 113259) and T. turgidium subsp. dicoccoides (Syrian, IG 46447) was 0.83. While T. turgidium subsp. durum from Italien (TRI 127360 and TRI 127284) and T. turgidium subsp. durum Egyptian landraces (Sohag, Almonshaah 34 and Sohag, Almonshaah 41) had not a compositional distance. The compositional distances between Egyptian Triticum aestivum cultivar sids 4 and two Egyptian Triticum aestivum landraces (New Valley, Dakhla 7 and Qena, Nag Hamad 27) were 0.07 and 0.09, respectively; this indicated that the composition distance between these two landraces and cultivar sids 4 was a very low value.

Table 4 Estimates of base composition bias difference between sequences

Molecular phylogenetic analysis based on amino acid sequence from partial matK gene translation

The translated amino acid sequences were used to detect the phylogenetic relationships between all studied species. The amino acid sequences of all studied species consist of 20 types of amino acid such as Alanine, cysteine, aspartic acid, glutamic acid, phenylalanine, glycine, histidine, isoleucine, lysine, leucine, methionine, asparagine, proline, glutamine, arginine, serine, threonine, valine, tryptophan, and tyrosine with a percentage average of its frequencies 1.06, 1.49, 4.77, 6.03, 8.44, 1.42, 4.11, 4.11, 4.44, 14.14, 1.26, 5.79, 6.62, 5.03, 4.74, 8.31, 0.73, 6.75, 1.16, and 6.19, respectively, as shown in Table 5.

Table 5 Types of amino acid and its percentage frequencies in each sample and the average of all 20 different Triticum species in amino acid sequence

The evolutionary history was conducted by using the Maximum Likelihood method based on the Tamura 3-parameter model by two statistical data analysis bootstrapping and pairwise distance analysis. The two types of data analysis gave different phylogenetic tree result (Figs. 4 and 5). The phylogenetic tree using bootstrapping analysis gave the same result based on the nucleotide sequences except for some differences that will be mentioned in the following context: group B was divided into two subgroups, I and II. Subgroup I (red color) was split into two clusters. The first cluster was divided into two subclusters, the first subcluster contained T. aestivum (Egyptian cultivar, sids 4 and Egyptian landraces, New Valley, Dakhla 7) while the second subcluster contained T. aestivum Egyptian cultivar, Giza 168 only. The second cluster was divided into two subclusters, and the first subcluster contained T. aestivum from Indian (TRI 28936) and Libyan (TRI 13955); this indicated that these two species were closely related to each other while the second subcluster contained T. aestivum Egyptian landraces, Qena, Nag Hamad 27 only. Subgroup II (blue color) was divided into two clusters. The first cluster was split into two subclusters, and the first subcluster contained T. turgidium subsp. dicoccoides (wild emmer) from Syrian with code number IG 46467 and IG 46447; this indicated that these two species were closely related to each other while the second subcluster contained T. dicoccon subsp. dicoccon (emmer) from Ursprungsland (TRI 28920) only. The second cluster contained T. turgidium subsp. durum (Desf), and this cluster was split into two subclusters based on bootstrapping analysis; the first subcluster was split into two sections. The first section was divided into two subsections, and the first section contained T. turgidium subsp. durum from Egypt (TRI 19223) and Egyptian cultivar Sohag 4 but the second section contained T. turgidium subsp. durum Egyptian landraces from Sohag, Almonshaah 34 and Sohag, Almonshaah 41 while the second subcluster was split into two sections. The first section was split into two subsections, the first subsection contained two T. turgidium subsp. durum species from Italian (TRI 27360 and 27284) which were closely related to each other, but the second subsection contained T. turgidium subsp. durum from Turkey (TRI 28834) and Iran (TRI 19242), these two species were closely related to each other. While the phylogenetic tree using pairwise distance analysis gave the same result obtained by bootstrapping analysis except some differences that will be mentioned in the following context: the second cluster from the subgroup II contained T. turgidium subsp. durum (Desf), this cluster was divided into two subclusters, the first subcluster contained T. turgidium subsp. durum from Turkey (TRI 28834) and Iran (TRI 19242), these two species were closely related to each other. The second subcluster consisted of two sections. The first section contained T. turgidium subsp. durum from Italian (TRI 27360 and 27284) which were closely related to each other. The second section was divided into two subsections, the first subsection contained T. turgidium subsp. durum from Egypt (TRI 19223) and Egyptian cultivar Sohag 4 but the second subsection contained T. turgidium subsp. durum Egyptian landraces from Sohag, Almonshaah 34, and Sohag, Almonshaah 41.

Fig. 4
figure4

Molecular phylogenetic analysis of different Triticum species by bootstrapping analysis depending on amino acid sequence of maturase-like protein. The phylogenetic analysis was performed in MEGA program version 6 by maximum likelihood method depending on the JTT matrix-based model (Jones, et al. 1992). The bootstrap consensus tree deduced from 1000 replicates is obtained to represent the evolutionary history of the species analyzed. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are excluded (Felsenstein, 1985)

Fig. 5
figure5

Molecular Phylogenetic analysis of different Triticum species by pairwise distance analysis depending on amino acid sequence of maturase-like protein. The phylogenetic analysis was performed in MEGA version 6 program by maximum likelihood method depending on the JTT matrix-based model (Jones, et al. 1992). The tree had the highest log likelihood (− 2539.1376). The tree’s percentage in which the associated species clustered together is shown next to the branches. Initial tree(s) for the heuristic search were performed automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model and then selecting the topology with superior log likelihood value. The tree is drawn to scale with branch lengths measured in the number of substitutions per site

Discussion

DNA barcoding considers a new method for discrimination between different species. According to the CBOL plant working group, DNA barcode must have a high efficiency for amplification and sequencing, also have a genetic variation that not only enables to distinguish sequences at the species level, but also it must be a conservative sequence among individuals of the same species (Hebert et al. 2003; Cowan et al. 2006; CBOL Plant Working Group 2009). The matK barcode contains high substitution rates within the species and is considering as an important candidate to documented plant systematics and evolution (Notredame et al. 2000). Savolainen et al. (2000) documented that the genetic relationships revealed by matK data are more robust than those obtained from combining rbcL and atpB sequences. Many studies indicated that Chloroplast matK barcode is an essential marker for discrimination of species or taxa (Newmaster and Ragupathy 2009; DeMattia et al. 2011), also this gene is used to resolve intergeneric or interspecific relations among flowering plants, such as Malpighiaceae, Poaceae, Nicotiana, Orchidaceae (Liang and Hilu 1996; Cameron et al. 2001; Salazar et al. 2003). The Plant Working Group (PWG) of the Consortium for the Barcoding of Life (CBOL) recommended that two regions of genes, rbcL and matK, could be adopted as the plant DNA barcode standard, and nuclear gene ITS as the supplement barcodes (CBOL Plant Wording Group 2009) while Dizkirici et al. (2016) investigated the phylogenetic relationships between different Triticum and Aegilops species by nuclear ITS and chloroplast matK genes, they found that the relationships between different polyploid wheat and Ae. speltoides species that obtained from both chloroplast matK and nuclear ITS sequences were the same, this ensured the idea of co-inheritance of nuclear and chloroplast genomes where Ae. speltoides was the maternal donor.

Our results showed that the partial region of the matK gene amplified and sequenced gave high polymorphism between all studied species (58.37%). This nearly agreed with Skuza et al. (2019) who observed that nucleotide sequences had a high variability within matK and rbcL regions. Polymorphism of the sequences was 2.2% in the rbcL region, while in the matK region was 6.5%. The most variable trnH-psbA (15.6%) intergenic region was the most useful for rye barcoding so different DNA barcodes should be used. This indicates that the matK region is suitable for differentiation and discrimination between the studied species.

Awad et al. (2017) performed DNA barcoding using matK and rbcl barcodes to discriminate 18 different Egyptian Triticum accessions. They used a universal matK primer from previously published literature to amplify the matK gene and this primer gave 100% PCR amplification for 18 samples while DNA sequencing was successfully performed for 6 matK sequences only from 18 fragments. Also, the analysis of their results demonstrated a limited ability of matK gene in discrimination between six Egyptian Triticum accessions (Sinai-AlGora-AlArish (114), Sinai-AlGora-AlArish (113), Northern coast-Raas ElHekma (117), Northern coast-Matroh (115), Bani Sweif 1, and Seds12). Their results showed the importance of in silico primer testing in the case of studies the closely related species. Their results conflicted with our results; this may be due to our using a specific primer that was designed from partial matK sequences using primer 3 version 4 online program. After that, we in silico tested the designed primer as mentioned in the primer design section. Also, we used Triticum species collected from different countries including Egypt. The Egyptian accessions that we used in our work differ from the Egyptian accession which used in their work; this may explain the expected reasons that distinguish Egyptian Triticum aestivum landraces and cultivars. Bafeel et al. (2011) found that using the universal matK primer leads to the inconsistent success rate of matk as a barcode so the universal primer needs further improvements.

The phylogenetic analysis considered the most effective method to determine the suitability of a DNA region for using as a barcode, because it should detect species-specific clusters. From our results, we documented the relation between 20 different Triticum species diploid 2n = 2x = 14 (Triticum monococcum L. AmAm (einkorn)), tetraploidy 2n = 4x = 28 (Triticum turgidium subsp. dicoccoides (wild emmer), Triticum dicoccon subsp. dicoccon (emmer) and Triticum turgidium subsp. Durum BBAuAu (Durum or macaroni wheat)) and hexaploid 2n = 6x = 42 (Triticum aestivum BBAuAuDD (common wheat)) based on partial chloroplast matK gene sequence and its translated amino acid sequence, our phylogenetic tree that discriminated all studied species was consistent with Sourdille et al. 2001 and Feuillet et al. 2008, who reported that the genome allohexaploid species (T. aestivum) is composed of genomes A, B, and D (AABBDD; 2n = 42) which is derived from three different diploid species. Whereas, T. turgidum subsp. durum is a tetraploid having Au and B genomes (AABB; 2n = 28), Au genome is originated from T. urartu while B genome is originated from Aegilops speltoides commonly known as a wild or weedy goatgrass. The Am genome is derived from T. monococcum L. (einkorn) which represents both the wild and cultivated varieties and is generally known as Triticum boeoticum Bosis. Emend. Schiem. The D genome is derived from the wild or weedy grass Aegilops tauschii L.

It was found from the cytoplasmic studies that the Sitopsis diploid species (Ae. Speltoides) was considered a maternal donor in the original cross that resulted in the tetraploid T. turgidum (Vedel et al. 1978). Several other investigations by Bowman et al. 1983 and Dizkirici et al. 2016 also confirmed that Ae. Speltoides was considered as the maternal donor and the source of the B genome of T. turgidum and T. aestivum.

The current investigation suggests the effectiveness of matK gene sequence data to resolve the phylogenetic problem in Triticum species. Also, the sequence variation, mean evolutionary rates, patterns, and transition/transversion rate in the nucleotide sequence, nucleotide diversity of matK gene can be used for the interpretation of evolutionary relationship within interspecies level of Tritium species. Finally, the matK sequence can discriminate the closely related Tritium species. So these sequences can be used as a DNA barcode for Triticum species.

Conclusion

The matK sequence has an important role in discriminating the closely related Triticum species. So these sequences can be used as a DNA barcode for detecting the evolutional history of Triticum species. It was found that there is a relation between hexaploid and tetraploid species because they are in the same group while Diploid species are in another group.

Availability of data and materials

Not applicable.

Abbreviations

bp:

Base pair

C:

Cytosine

DNA:

Deoxyribonucleic acid

G:

Guanine

MatK gene:

Maturase encoding gene

MEGA software:

Molecular Evolutionary Genetic Analysis software

NCBI:

National Center for Biotechnology Information

PCR:

Polymerase chain reaction

T:

Thymine

U:

Uracil

References

  1. Ajmal AM, Gyulai G, Hidwegi N, Kerti B, Al Hemaid F, Pandey AK, Lee J (2014) The changing epitome of species identification–DNA barcoding. Saudi J Biol Sci 21:204–231

    Article  Google Scholar 

  2. Awad M, Fahmy RM, Mosa KA, Helmy M, El-Feky FA (2017) Identification of effective DNA barcodes for Triticum plants through chloroplast genome-wide analysis. Comput Biol Chem 71:20–31

    CAS  Article  Google Scholar 

  3. Bafeel SO, Arif IA, Bakir MA, Khan HA, Al Farhan AH, Al Homaidan AA, Ahamed A, Thomas J (2011) Comparative evaluation of PCR success with universal primers of maturase K (matK) and ribulose-1, 5-bisphosphate carboxylase oxygenase large subunit (rbcL) for barcoding of some arid plants. Plant Omics 4:195

    CAS  Google Scholar 

  4. Bowman CM, Bonnard G, Dyer TA (1983) Chloroplast DNA variation between species of Triticum and Aegilops–location of the variation on the chloroplast genome and its relevance to the inheritance and classification of the cytoplasm. Theor Appl Genet 65:247–262

    CAS  Article  Google Scholar 

  5. Cameron KM, Chase MW, Anderson WR, Hills HG (2001) Molecular systematics of Malpighiaceae: evidence from plastid rbcL and matK sequences. Am J Bot 88:1847–1862

    CAS  Article  Google Scholar 

  6. CBOL Plant Working Group (2009) A DNA barcode for land plants. Proc Natl Acad Sci U S A 106:12794–12797

    Article  Google Scholar 

  7. China Plant BOL Group (2011) Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants. Proc Natl Acad Sci U S A 108:19641–19646

    ADS  Article  Google Scholar 

  8. Cowan RS, Chase MW, Kress WJ, Savolainen V (2006) 300,000 species to identify: problems, progress and prospects in DNA barcoding of land plants. Taxon 55:611–616

    Article  Google Scholar 

  9. DeMattia F, Bruni I, Galimberti A, Cattaneo F, Casiraghi M, Labra M (2011) A comparative study of different DNA barcoding markers for the identification of some members of Lamiaceae. Food Res Int 44:693–702

    CAS  Article  Google Scholar 

  10. Dizkirici A, Kansu C, Onde S (2016) Molecular phylogeny of Triticum and Aegilops genera based on ITS and MatK sequence data. Pak J Bot 48(1):143–153

    CAS  Google Scholar 

  11. Doebley JF, Gaut BS, Smith BD (2006) The molecular genetics of crop domestication. Cell 127:1309–1321

    CAS  Article  Google Scholar 

  12. Felsenstein J (1985) Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39: 783–791.

    Article  Google Scholar 

  13. Ferri E, Barbuto M, Bain O, Galimberti A, Uni S, Guerrero R, Ferté H, Bandi C, Martin C, Casiraghi M (2009) Integrated taxonomy: traditional approach and DNA barcoding for the identification of filarioid worms and related parasites (Nematoda). Front Zool 6:1

    CAS  Article  Google Scholar 

  14. Feuillet C, Langridge P, Waugh R (2008) Cereal breeding takes a walk on the wild side. Trends Genet 24:24–32

    CAS  Article  Google Scholar 

  15. Golovnina KA, Glushkov SA, Blinov AG, Mayorov VI, Adkison LR, Goncharov NP (2007) Molecular phylogeny of the genus Triticum L. Plant Syst Evol 264:195–216

    CAS  Article  Google Scholar 

  16. Gu YQ, Coleman-Derr D, Kong X, Anderson OD (2004) Rapid genome evolution revealed by comparative sequence analysis of orthologous regions from four Triticeae genomes. Plant Physiol 135:459–470

    CAS  Article  Google Scholar 

  17. Hall TA (1999) Bio-Edit: a user friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41:95–98

    CAS  Google Scholar 

  18. Hebert PDN, Cywinska A, Ball SL, de Waard JR (2003) Biological identification through DNA barcodes. Proc R Soc Lond 270:313–322

    CAS  Article  Google Scholar 

  19. Kumar S, Gadagkar SR (2001) Disparity Index: a simple statistic to measure and test the homogeneity of substitution patterns between molecular sequences. Genetics 158:1321–1327

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Li XW, Yang Y, Henry RJ, Rossetto M, Wang Y, Chen S (2015) Plant DNA barcoding: from gene to genome. Biol Rev Camb Philos Soc 90:157–166

    Article  Google Scholar 

  21. Liang HP, Hilu KW (1996) Application of the matK gene sequences to grass systematics. Can J Bot 74:125–134

    CAS  Article  Google Scholar 

  22. McFadden ES, Sears ER (1946) The origin of Triticum spelta and its free-threshing hexaploid relatives. J Hered 37:81–89

    Article  Google Scholar 

  23. Newmaster SG, Ragupathy S (2009) Testing plant barcoding in a sister species complex of pantropical Acacia (Mimosoideae, Fabaceae). Mol Ecol Resour 9:172–180

    CAS  Article  Google Scholar 

  24. Notredame C, Higgins DG, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217

    CAS  Article  Google Scholar 

  25. Provan J, Wolters P, Caldwell KH, Powell W (2004) High-resolution organellar genome analysis of Triticum and Aegilops sheds new light on cytoplasm evolution in wheat. Theor Appl Genet 108:1182–1190

    CAS  Article  Google Scholar 

  26. Salazar GA, Chase MW, Arenas MAS, Ingrouille M (2003) Phylogenetics of Cranichideae with emphasis on Spiranthinae (Orchidaceae, Orchidoideae): evidence from plastid and nuclear DNA sequences. Am J Bot 90:777–795

    CAS  Article  Google Scholar 

  27. Savolainen V, Chase MW, Hoot SB, Morton CM, Soltis DE, Bayer C, Fay MF, De Bruijn AY, Sullivan S, Qiu YL (2000) Phylogenetics of flowering plants based upon a combined analysis of plastid atpB and rbcL gene sequences. Syst Biol 49:306–362

    CAS  Article  Google Scholar 

  28. Skuza L, Filip E, Szućko I (2015) Intergenic spacer length variability in cultivated, weedy and wild rye species. Open Life Sci 10(1):175–181

    Google Scholar 

  29. Skuza L, Szućko I, Filip E, Adamczyk A (2019) DNA barcoding in selected species and subspecies of Rye (Secale) using three chloroplast loci (matK, rbcL, trnH-psbA). Notulae Botanicae Horti Agrobotanici Cluj-Napoca 47(1):54–62 DOI:47.15835

    CAS  Article  Google Scholar 

  30. Sourdille P, Tavaud M, Charmet G, Bernard M (2001) Transferability of wheatmicrosatellites to diploid Triticeae species carrying the a, B and D genomes. Theor Appl Genet 103:346–352

    CAS  Article  Google Scholar 

  31. Stoeckle M (2003) Taxonomy, DNA, and the bar code of life. BioScience 53:796–797

    Article  Google Scholar 

  32. Tamura K (1992) Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G + C-content biases. Mol Biol Evol 9:678–687

    CAS  PubMed  Google Scholar 

  33. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30:2725–2729

    CAS  Article  Google Scholar 

  34. Vedel F, Quetier F, Dosba F (1978) Study of wheat phylogeny by EcoRI analysis of chloroplastic and mitochondrial DNAs. Plant Sci Lett 13:97–102

    Article  Google Scholar 

  35. Young ND, dePamphilis CW (2000) Purifying selection detected in the plastid gene matK and flanking ribozyme regions within a group II intron of nonphotosynthetic plants. Mol Biol Evol 17:1933–1941

    CAS  Article  Google Scholar 

  36. Zhang W, Qu LJ, Gu H, Gao W, Liu M, Chen J, Chen Z (2002) Studies on the origin and evolution of the tetraploid wheats based on the internal transcribed spacer (ITS) sequences of nuclear ribosomal DNA. Theor Appl Genet 104:1099–1106

    ADS  CAS  Article  Google Scholar 

Download references

Acknowledgements

We would like to thank the National Research Centre, Dokki, Giza, Egypt, for supporting this research through the project No AR111105.

Funding

This research was funded by the project No AR111105.

Author information

Affiliations

Authors

Contributions

This work was carried out in collaboration between the two authors. Author SAO designed the study, wrote the protocol, and analyzed the data of this study. SAO and WAR managed the lab work and manage the literature searches and wrote this manuscript together. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Samira A. Osman.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Osman, S.A., Ramadan, W.A. DNA barcoding of different Triticum species. Bull Natl Res Cent 43, 174 (2019). https://doi.org/10.1186/s42269-019-0192-9

Download citation

Keywords

  • Triticum species
  • Genetic relationships
  • matK gene
  • DNA barcode