期刊论文

【摘要】

Background

The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs.

Results

A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing.

Conclusions

Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.

【授权许可】

2013 Krasileva et al.; licensee BioMed Central Ltd.

【预览】

附件列表
Files	Size	Format	View
20150307051152706.pdf	852KB	PDF	download
Figure 6.	56KB	Image	download
Figure 5.	105KB	Image	download
Figure 4.	49KB	Image	download
Figure 3.	91KB	Image	download
Figure 2.	89KB	Image	download
Figure 1.	34KB	Image	download

【图表】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

【参考文献】

[1]Doyle JJ, Flagel LE, Paterson AH, Rapp RA, Soltis DE, Soltis PS, Wendel JF: Evolutionary genetics of genome merger and doubling in plants. Annu Rev Genet 2008, 42:443-461.
[2]Soltis PS, Soltis DE: The role of hybridization in plant speciation. Annu Rev Plant Biol 2009, 60:561-588.
[3]Blanc G, Wolfe KH: Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell 2004, 16:1667-1678.
[4]Dubcovsky J, Dvorak J: Genome plasticity a key factor in the success of polyploid wheat under domestication. Science 2007, 316:1862-1866.
[5]Bowers JE, Chapman BA, Rong J, Paterson AH: Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 2003, 422:433-438.
[6]Paterson AH, Bowers JE, Chapman BA: Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc Natl Acad Sci USA 2004, 101:9903-9908.
[7]Dvorak J, Zhang HB: Variation in repeated nucleotide sequences sheds light on the phylogeny of the wheat B and G genomes. Proc Natl Acad Sci USA 1990, 87:9640-9644.
[8]Dvorak J, Terlizzi P, Zhang HB, Resta P: The evolution of polyploid wheats: identification of the A genome donor species. Genome 1993, 36:21-31.
[9]Daud HM, Gustafson JP: Molecular evidence for Triticum speltoides as a B-genome progenitor of wheat (Triticum aestivum). Genome 1996, 39:543-548.
[10]Huang S, Sirikhachornkit A, Su X, Faris J, Gill B, Haselkorn R, Gornicki P: Genes encoding plastid acetyl-CoA carboxylase and 3-phosphoglycerate kinase of the Triticum/Aegilops complex and the evolutionary history of polyploid wheat. Proc Natl Acad Sci USA 2002, 99:8133-8138.
[11]Choulet F, Wicker T, Rustenholz C, Paux E, Salse J, Leroy P, Schlub S, Le Paslier MC, Magdelenat G, Gonthier C, Couloux A, Budak H, Breen J, Pumphrey M, Liu S, Kong X, Jia J, Gut M, Brunel D, Anderson JA, Gill BS, Appels R, Keller B, Feuillet C: Megabase level sequencing reveals contrasted organization and evolution patterns of the wheat gene and transposable element spaces. Plant Cell 2010, 22:1686-1701.
[12]Wicker T, Yahiaoui N, Keller B: Contrasting rates of evolution in Pm3 loci from three wheat species and rice. Genetics 2007, 177:1207-1216.
[13]Isidore E, Scherrer B, Chalhoub B, Feuillet C, Keller B: Ancient haplotypes resulting from extensive molecular rearrangements in the wheat A genome have been maintained in species of three different ploidy levels. Genome Res 2005, 15:526-536.
[14]Leister D: Tandem and segmental gene duplication and recombination in the evolution of plant disease resistance gene. Trends Genet 2004, 20:116-122.
[15]Cantu D, Vanzetti LS, Sumner A, Dubcovsky M, Matvienko M, Distelfeld A, Michelmore RW, Dubcovsky J: Small RNAs, DNA methylation and transposable elements in wheat. BMC Genomics 2010, 11:408. BioMed Central Full Text
[16]Dvorak J, Yang ZL, You FM, Luo MC: Deletion polymorphism in wheat chromosome regions with contrasting recombination rates. Genetics 2004, 168:1665-1675.
[17]Akhunov ED, Akhunova AR, Linkiewicz AM, Dubcovsky J, Hummel D, Lazo G, Chao S, Anderson OD, David J, Qi L, Echalier B, Gill BS, Gustafson JP, La Rota M, Sorrells ME, Zhang D, Nguyen HT, Kalavacharla V, Hossain K, Kianian SF, Peng J, Lapitan NL, Wennerlind EJ, Nduati V, Anderson JA, Sidhu D, Gill KS, McGuire PE, Qualset CO, et al.: Synteny perturbations between wheat homoeologous chromosomes caused by locus duplications and deletions correlate with recombination rates. Proc Natl Acad Sci USA 2003, 100:10836-10841.
[18]Feldman M, Levy AA: Genome evolution due to allopolyploidization in wheat. Genetics 2012, 192:763-774.
[19]Brenchley R, Spannagl M, Pfeifer M, Barker GL, D'Amore R, Allen AM, McKenzie N, Kramer M, Kerhornou A, Bolser D, Kay S, Waite D, Trick M, Bancroft I, Gu Y, Huo N, Luo MC, Sehgal S, Gill B, Kianian S, Anderson O, Kersey P, Dvorak J, McCombie WR, Hall A, Mayer KF, Edwards KJ, Bevan MW, Hall N: Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature 2012, 491:705-710.
[20]Wicker T, Mayer KF, Gundlach H, Martis M, Steuernagel B, Scholz U, Simkova H, Kubalakova M, Choulet F, Taudien S, Platzer M, Feuillet C, Fahima T, Budak H, Dolezel J, Keller B, Stein N: Frequent gene movement and pseudogene evolution is common to the large and complex genomes of wheat, barley, and their relatives. Plant Cell 2011, 23:1706-1718.
[21]Akhunov ED, Sehgal S, Liang H, Wang S, Akhunova AR, Kaur G, Li W, Forrest KL, See D, Simkova H, Ma Y, Hayden MJ, Luo M, Faris JD, Dolezel J, Gill BS: Comparative analysis of syntenic genes in grass genomes reveals accelerated rates of gene structure and coding sequence evolution in polyploid wheat. Plant Physiol 2013, 161:252-265.
[22]Paux E, Roger D, Badaeva E, Gay G, Bernard M, Sourdille P, Feuillet C: Characterizing the composition and evolution of homoeologous genomes in hexaploid wheat through BAC-end sequencing on chromosome 3B. Plant J 2006, 48:463-474.
[23]Schulz MH, Zerbino DR, Vingron M, Birney E: Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 2012, 28:1086-1092.
[24]Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 2011, 29:644-652.
[25]Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, Mungall K, Lee S, Okada HM, Qian JQ, Griffith M, Raymond A, Thiessen N, Cezard T, Butterfield YS, Newsome R, Chan SK, She R, Varhol R, Kamoh B, Prabhu AL, Tam A, Zhao Y, Moore RA, Hirst M, Marra MA, Jones SJ, Hoodless PA, Birol I: De novo assembly and analysis of RNA-seq data. Nat Methods 2010, 7:909-912.
[26]Gruenheit N, Deusch O, Esser C, Becker M, Voelckel C, Lockhart P: Cutoffs and k-mers: implications from a transcriptome study in allopolyploid plants. BMC Genomics 2012, 13:92. BioMed Central Full Text
[27]Schreiber AW, Hayden MJ, Forrest KL, Kong SL, Langridge P, Baumann U: Transcriptome-scale homoeolog-specific transcript assemblies of bread wheat. BMC Genomics 2012, 13:492. BioMed Central Full Text
[28]Li C, Dubcovsky J: Wheat FT protein regulates VRN1 transcription through interactions with FDL2. Plant J 2008, 55:543-554.
[29]Duan J, Xia C, Zhao G, Jia J, Kong X: Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data. BMC Genomics 2012, 13:392. BioMed Central Full Text
[30]Cantu D, Pearce SP, Distelfeld A, Christiansen MW, Uauy C, Akhunov E, Fahima T, Dubcovsky J: Effect of the down-regulation of the high Grain Protein Content (GPC) genes on the wheat transcriptome during monocarpic senescence. BMC Genomics 2011, 12:492. BioMed Central Full Text
[31]Paux E, Sourdille P, Salse J, Saintenac C, Choulet F, Leroy P, Korol A, Michalak M, Kianian S, Spielmeyer W, Lagudah E, Somers D, Kilian A, Alaux M, Vautrin S, Berges H, Eversole K, Appels R, Safar J, Simkova H, Dolezel J, Bernard M, Feuillet C: A physical map of the 1-gigabase bread wheat chromosome 3B. Science 2008, 322:101-104.
[32]International Wheat Genome Sequencing Consortium. [http://www.wheatgenome.org] webcite
[33]Brown C, Howe A, Zhang Q, Pyrkosz A, Brom T: A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data. [http://arxiv.org/abs/1203.4802] webcite 2012. arXiv
[34]Trick M, Adamski NM, Mugford SG, Jiang CC, Febrer M, Uauy C: Combining SNP discovery from next-generation sequencing data with bulked segregant analysis (BSA) to fine-map genes in polyploid wheat. BMC Plant Biol 2012, 12:14. BioMed Central Full Text
[35]Mochida K, Yoshida T, Sakurai T, Ogihara Y, Shinozaki K: TriFLDB: a database of clustered full-length coding sequences from Triticeae with applications to comparative grass genomics. Plant Physiol 2009, 150:1135-1146.
[36]Project website accompanying this paper: T. turgidum and T. urartu files for contigs, open reading frames, predicted proteins and gene models. [http://maswheat.ucdavis.edu/Transcriptome/index.htm] webcite
[37]Lo C, Bashir A, Bansal V, Bafna V: Strobe sequence design for haplotype assembly. BMC Bioinformatics 2011, Suppl 1:S24.
[38]Haznedaroglu BZ, Reeves D, Rismani-Yazdi H, Peccia J: Optimization of de novo transcriptome assembly from high-throughput short read sequencing data improves functional annotation for non-model organisms. BMC Bioinformatics 2012, 13:170. BioMed Central Full Text
[39]Buffalo V: Blast2cap3 software. [https://github.com/vsbuffalo/blast2cap3] webcite
[40]Huang X, Madan A: CAP3: A DNA sequence assembly program. Genome Res 1999, 9:868-877.
[41]Buffalo V: Findorf software. [https://github.com/vsbuffalo/findorf] webcite
[42]Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer EL, Eddy SR, Bateman A: The Pfam protein families database. Nucleic Acids Res 2010, 38:D211-222.
[43]Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, Kodzius R, Shimokawa K, Bajic VB, Brenner SE, Batalov S, Forrest AR, Zavolan M, Davis MJ, Wilming LG, Aidinis V, Allen JE, Ambesi-Impiombato A, Apweiler R, Aturaliya RN, Bailey TL, Bansal M, Baxter L, Beisel KW, Bersano T, Bono H, et al.: The transcriptional landscape of the mammalian genome. Science 2005, 309:1559-1563.
[44]Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G, Martin D, Merkel A, Knowles DG, Lagarde J, Veeravalli L, Ruan X, Ruan Y, Lassmann T, Carninci P, Brown JB, Lipovich L, Gonzalez JM, Thomas M, Davis CA, Shiekhattar R, Gingeras TR, Hubbard TJ, Notredame C, Harrow J, Guigo R: The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 2012, 22:1775-1789.
[45]Xin M, Wang Y, Yao Y, Song N, Hu Z, Qin D, Xie C, Peng H, Ni Z, Sun Q: Identification and characterization of wheat long non-protein coding RNAs responsive to powdery mildew infection and heat stress by using microarray analysis and SBS sequencing. BMC Plant Biol 2011, 11:61. BioMed Central Full Text
[46]Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, Barnes I, Bignell A, Boychenko V, Hunt T, Kay M, Mukherjee G, Rajan J, Despacio-Reyes G, Saunders G, Steward C, Harte R, Lin M, Howald C, Tanzer A, Derrien T, Chrast J, Walters N, Balasubramanian S, Pei B, Tress M, et al.: GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res 2012, 22:1760-1774.
[47]Ohno S: Evolution by Gene Duplication. New York: Springer-Verlag; 1970.
[48]Echols N, Harrison P, Balasubramanian S, Luscombe NM, Bertone P, Zhang Z, Gerstein M: Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes. Nucleic Acids Res 2002, 30:2515-2523.
[49]Wicker T, Matthews DE, Keller B: TREP: a database for Triticeae repetitive elements. TRENDS in Plant Science 2002, 7:561-562.
[50]Slater GS, Birney E: Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 2005, 6:31. BioMed Central Full Text
[51]Kitzman JO, Mackenzie AP, Adey A, Hiatt JB, Patwardhan RP, Sudmant PH, Ng SB, Alkan C, Qiu R, Eichler EE, Shendure J: Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat Biotechnol 2011, 29:59-63.
[52]Bansal V, Bafna V: HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 2008, 24:i153-159.
[53]Buffalo V: Readphaser - separate reads based on mapping results and HapCUT data. [https://github.com/vsbuffalo/readphaser] webcite
[54]Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Muller WE, Wetter T, Suhai S: Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res 2004, 14:1147-1159.
[55]Uauy C, Paraiso F, Colasuonno P, Tran RK, Tsai H, Berardi S, Comai L, Dubcovsky J: A modified TILLING approach to detect induced mutations in tetraploid and hexaploid wheat. BMC Plant Biol 2009, 9:115. BioMed Central Full Text
[56]Akhunov ED, Akhunova AR, Dvorak J: BAC libraries of Triticum urartu, Aegilops speltoides and Ae. tauschii, the diploid ancestors of polyploid wheat. Theor Appl Genet 2005, 111:1617-1622.
[57]Ling HQ, Zhao S, Liu D, Wang J, Sun H, Zhang C, Fan H, Li D, Dong L, Tao Y, Gao C, Wu H, Li Y, Cui Y, Guo X, Zheng S, Wang B, Yu K, Liang Q, Yang W, Lou X, Chen J, Feng M, Jian J, Zhang X, Luo G, Jiang Y, Liu J, Wang Z, Sha Y, et al.: Draft genome of the wheat A-genome progenitor Triticum urartu. Nature 2013, 496:87-90.
[58]Wheat Genome Sequencing Project. [http://www.cshl.edu/genome/wheat] webcite
[59]Matvienko M, Kozik A, Froenicke L, Lavelle D, Martineau B, Perroud B, Michelmore R: Consequences of normalizing transcriptomic and genomic libraries of plant genomes using a duplex-specific nuclease and tetramethylammonium chloride. PLoS One 2013, 8:e55913.
[60]Buffalo V: Quick Read Quality Control. [http://bioconductor.org/packages/2.11/bioc/html/qrqc.html] webcite
[61]Scythe - A Bayesian adapter trimmer. [https://github.com/vsbuffalo/scythe] webcite
[62]Joshi N: Sickle - A windowed adaptive trimming tool for FASTQ files using quality. [https://github.com/najoshi/sickle] webcite
[63]Kent WJ: BLAT--the BLAST-like alignment tool. Genome Res 2002, 12:656-664.
[64]Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22:1658-1659.
[65]Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215:403-410.
[66]BioPerl production scripts - Taxonomy. [http://www.bioperl.org/wiki/Bioperl_scripts#Taxonomy] webcite
[67]Eddy SR: Accelerated Profile HMM Searches. PLoS Comput Biol 2011, 7:e1002195.
[68]Slater GSC: Exonerate software. [http://www.ebi.ac.uk/~guy/exonerate/] webcite
[69]Garrison E, Marth G: Haplotype-based variant detection from short-read sequencing. [http://arxiv.org/abs/1207.3907] webcite 2012. arXiv
[70]Ellison CE, Hall C, Kowbel D, Welch J, Brem RB, Glass NL, Taylor JW: Population genomics and local adaptation in wild isolates of a model microbial eukaryote. Proc Natl Acad Sci USA 2011, 108:2831-2836.
[71]Pages H, Aboyoun P, Gentleman R, DebRoy S: Biostrings: String objects representing biological sequences, and matching algorithms. R package version 2241
[72]Pages H, Aboyoun P, Lawrence M: IRanges: Infrastructure for manipulating intervals on sequences. R package version 1144
[73]USDA GrainGenes. [http://wheat.pw.usda.gov/GG2/WheatTranscriptome/] webcite

Genome Biology
Separating homeologs by phasing in the tetraploid wheat transcriptome

Jorge Dubcovsky² Cristobal Uauy⁶ Eduard Akhunov⁴ IWGS Consortium¹ Shichen Wang⁴ Marcelo Soria⁷ Facundo Tabbita⁵ Sarah Ayling³ Stephen Pearce⁵ Paul Bailey³ Vince Buffalo⁵ Ksenia V Krasileva⁵
[1] International Wheat Genome Sequencing Consortium;Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA;The Genome Analysis Centre, Norwich Research Park, Norwich NR4 7UH, UK;Department of Plant Pathology, Kansas State University, Manhattan, KS 66506, USA;Dept. Plant Sciences, University of California, Davis, CA 9561, USA;John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK;Microbiology, University of Buenos Aires, INBA-CONICET, Buenos Aires, Argentina
关键词: gene prediction; phasing; pseudogenes; Triticum turgidum; Triticum urartu; polyploid; wheat; multiple k-mer assembly; Transcriptome assembly;
Others : 1135323 DOI : 10.1186/gb-2013-14-6-r66

received in 2013-05-25, accepted in 2013-06-25, 发布年份 2013
PDF


	文献评价指标
	下载次数：81次	浏览次数：30次

【 摘 要 】

Background

Results

Conclusions

【 授权许可】

【 预 览 】

【 图 表 】

【 参考文献 】

【摘要】

【授权许可】

【预览】

【图表】

【参考文献】