期刊论文详细信息
BMC Genomics
A pipeline for the de novo assembly of the Themira biloba (Sepsidae: Diptera) transcriptome using a multiple k-mer length approach
Julia H Bowsher1  Ian Dworkin2  Alex S Torson1  Dacotah Melicher1 
[1] Department of Biological Sciences, North Dakota State University, 1340 Bolley Drive, 218 Stevens Hall, Fargo, ND 58102, USA;Department of Zoology, Michigan State University, 328 Giltner Hall, East Lansing, MI 48823, USA
关键词: Cloud computing;    Pipeline;    Transcriptome;    Sepsidae;    de novo assembly;    Multiple k-mer;   
Others  :  1217771
DOI  :  10.1186/1471-2164-15-188
 received in 2013-11-04, accepted in 2014-03-03,  发布年份 2014
PDF
【 摘 要 】

Background

The Sepsidae family of flies is a model for investigating how sexual selection shapes courtship and sexual dimorphism in a comparative framework. However, like many non-model systems, there are few molecular resources available. Large-scale sequencing and assembly have not been performed in any sepsid, and the lack of a closely related genome makes investigation of gene expression challenging. Our goal was to develop an automated pipeline for de novo transcriptome assembly, and to use that pipeline to assemble and analyze the transcriptome of the sepsid Themira biloba.

Results

Our bioinformatics pipeline uses cloud computing services to assemble and analyze the transcriptome with off-site data management, processing, and backup. It uses a multiple k-mer length approach combined with a second meta-assembly to extend transcripts and recover more bases of transcript sequences than standard single k-mer assembly. We used 454 sequencing to generate 1.48 million reads from cDNA generated from embryo, larva, and pupae of T. biloba and assembled a transcriptome consisting of 24,495 contigs. Annotation identified 16,705 transcripts, including those involved in embryogenesis and limb patterning. We assembled transcriptomes from an additional three non-model organisms to demonstrate that our pipeline assembled a higher-quality transcriptome than single k-mer approaches across multiple species.

Conclusions

The pipeline we have developed for assembly and analysis increases contig length, recovers unique transcripts, and assembles more base pairs than other methods through the use of a meta-assembly. The T. biloba transcriptome is a critical resource for performing large-scale RNA-Seq investigations of gene expression patterns, and is the first transcriptome sequenced in this Dipteran family.

【 授权许可】

   
2014 Melicher et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150708043137328.pdf 1443KB PDF download
Figure 6. 95KB Image download
Figure 5. 83KB Image download
Figure 4. 65KB Image download
Figure 3. 63KB Image download
Figure 2. 57KB Image download
Figure 1. 145KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

【 参考文献 】
  • [1]Pont AC: The Sepsidae (Diptera) of Europe. Leiden ; Boston: Brill: [Fauna Entomologica Scandinavica, v. 37]; 2002.
  • [2]Bowsher JH, Ang Y, Ferderer T, Meier R: DECIPHERING THE EVOLUTIONARY HISTORY AND DEVELOPMENTAL MECHANISMS OF A COMPLEX SEXUAL ORNAMENT: THE ABDOMINAL APPENDAGES OF SEPSIDAE (DIPTERA). Evolution 2013, 67:1069-1080.
  • [3]Ingram KK, Laamanen T, Puniamoorthy N, Meier R: Lack of morphological coevolution between male forelegs and female wings in Themira (Sepsidae: Diptera: Insecta). Biol J Linn Soc 2008, 93:227-238.
  • [4]Puniamoorthy N, Ismail MRB, Tan DSH, Meier R: From kissing to belly stridulation: comparative analysis reveals surprising diversity, rapid evolution, and much homoplasy in the mating behaviour of 27 species of sepsid flies (Diptera: Sepsidae). J Evol Biol 2009, 22:2146-2156.
  • [5]Baena ML, Eberhard WG: Appearances deceive: female “resistance” behaviour in a sepsid fly is not a test of male ability to hold on. Ethol Ecol Evol 2007, 19:27-50.
  • [6]Martin OY, Hosken DJ: The evolution of reproductive isolation through sexual conflict. Nature 2003, 423:979-982.
  • [7]Blanckenhorn WU, Kraushaar URS, Teuschl Y, Reim C: Sexual selection on morphological and physiological traits and fluctuating asymmetry in the black scavenger fly Sepsis cynipsea. J Evol Biol 2004, 17:629-641.
  • [8]Puniamoorthy N, Schäfer MA, Blanckenhorn WU: Sexual selection accounts for the geographic reversal of sexual size dimorphism in the dung fly, sepsis punctum (Diptera: Sepsidae). Evol Int J Org Evol 2012, 66:2117-2126.
  • [9]Puniamoorthy N, Su K, Meier R: Bending for love: losses and gains of sexual dimorphisms are strictly correlated with changes in the mounting position of sepsid flies (Sepsidae: Diptera). BMC Evol Biol 2008, 8:155. BioMed Central Full Text
  • [10]Eberhard WG: Multiple origins of a major novelty: moveable abdominal lobes in male sepsid flies (Diptera: Sepsidae), and the question of developmental constraints. Evol Dev 2001, 3:206-222.
  • [11]Eberhard WG: Species-specific genitalic copulatory courtship in sepsid flies (Diptera, Sepsidae, Microsepsis) and theories of genitalic evolution. Evol Int J Org Evol 2001, 55:93-102.
  • [12]Eberhard WG: Sexual behavior and morphology of Themira minor (Diptera: Sepsidae) males and the evolution of male sternal lobes and genitalic surstyli. Can Entomol 2012, 135:569-581.
  • [13]Eberhard WG: Evolutionary Conflicts of Interest: Are Female Sexual Decisions Different? Am Nat 2005, 165:S19-S25.
  • [14]Bowsher JH, Nijhout HF: Evolution of novel abdominal appendages in a sepsid fly from histoblasts, not imaginal discs. Evol Dev 2007, 9:347-354.
  • [15]Bowsher JH, Nijhout HF: Partial co-option of the appendage patterning pathway in the development of abdominal appendages in the sepsid fly Themira biloba. Dev Genes Evol 2009, 219:577-587.
  • [16]Wiegmann BM, Yeates DK, Thorne JL, Kishino H: Time flies, a new molecular time-scale for brachyceran fly evolution without a clock. Syst Biol 2003, 52:745-756.
  • [17]Wiegmann BM, Trautwein MD, Winkler IS, Barr NB, Kim J-W, Lambkin C, Bertone MA, Cassel BK, Bayless KM, Heimberg AM, Wheeler BM, Peterson KJ, Pape T, Sinclair BJ, Skevington JH, Blagoderov V, Caravas J, Kutty SN, Schmidt-Ott U, Kampmeier GE, Thompson FC, Grimaldi DA, Beckenbach AT, Courtney GW, Friedrich M, Meier R, Yeates DK: Episodic radiations in the fly tree of life. Proc Natl Acad Sci 2011, 108:5690-5695.
  • [18]Hare EE, Peterson BK, Iyer VN, Meier R, Eisen MB: Sepsid even-skipped Enhancers Are Functionally Conserved in Drosophila Despite Lack of Sequence Conservation. PLoS Genet 2008, 4:e1000106.
  • [19]Schwarz D, Robertson HM, Feder JL, Varala K, Hudson ME, Ragland GJ, Hahn DA, Berlocher SH: Sympatric ecological speciation meets pyrosequencing: sampling the transcriptome of the apple maggot Rhagoletis pomonella. BMC Genomics 2009, 10:633. BioMed Central Full Text
  • [20]Zheng W, Peng T, He W, Zhang H: High-Throughput Sequencing to Reveal Genes Involved in Reproduction and Development in Bactrocera dorsalis (Diptera: Tephritidae). PLoS ONE 2012, 7:e36463.
  • [21]Hsu J-C, Chien T-Y, Hu C-C, Chen M-JM WW-J, Feng H-T, Haymer DS, Chen C-Y: Discovery of Genes Related to Insecticide Resistance in Bactrocera dorsalis by Functional Genomic Analysis of a De Novo Assembled Transcriptome. PLoS ONE 2012, 7:e40950.
  • [22]Nirmala X, Schetelig MF, Yu F, Handler AM: An EST database of the Caribbean fruit fly, Anastrepha suspensa (Diptera: Tephritidae). Gene 2013, 517:212-217.
  • [23]Sboner A, Mu XJ, Greenbaum D, Auerbach RK, Gerstein MB: The real cost of sequencing: higher than you think! Genome Biol 2011, 12:125. BioMed Central Full Text
  • [24]Surget-Groba Y, Montoya-Burgos JI: Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res 2010, 20:1432-1440.
  • [25]Gruenheit N, Deusch O, Esser C, Becker M, Voelckel C, Lockhart P: Cutoffs and k-mers: implications from a transcriptome study in allopolyploid plants. BMC Genomics 2012, 13:92. BioMed Central Full Text
  • [26]Velvet: a sequence assembler for very short reads. [http://www.ebi.ac.uk/~zerbino/velvet/ webcite]
  • [27]Oases: a transcriptome assembler for very short reads. [http://www.ebi.ac.uk/~zerbino/oases/ webcite]
  • [28]Huang X, Madan A: CAP3: A DNA sequence assembly program. Genome Res 1999, 9:868-877.
  • [29]Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinforma Oxf Engl 2005, 21:3674-3676.
  • [30]Goff SA, Vaughn M, McKay S, Lyons E, Stapleton AE, Gessler D, Matasci N, Wang L, Hanlon M, Lenards A, Muir A, Merchant N, Lowry S, Mock S, Helmke M, Kubach A, Narro M, Hopkins N, Micklos D, Hilgert U, Gonzales M, Jordan C, Skidmore E, Dooley R, Cazes J, McLay R, Lu Z, Pasternak S, Koesterke L, Piel WH, et al.: The iPlant Collaborative: Cyberinfrastructure for Plant Biology. Plant Sci: Front; 2011:2.
  • [31]Blankenberg D, Kuster GV, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J: Galaxy: a web-based genome analysis tool for experimentalists. In Curr Protoc Mol Biol. Edited by Ausubel FM, Brent R, Kingston RE, Moore DD, Seidman JG, Smith JA, Struhl K. Hoboken, NJ, USA: John Wiley & Sons, Inc; 2010.
  • [32]Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, Miller W, Kent WJ, Nekrutenko A: Galaxy: a platform for interactive large-scale genome analysis. Genome Res 2005, 15:1451-1455.
  • [33]Goecks J, Nekrutenko A, Taylor J, Galaxy Team T: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 2010, 11:R86. BioMed Central Full Text
  • [34]Ewen-Campen B, Shaner N, Panfilio KA, Suzuki Y, Roth S, Extavour CG: The maternal and early embryonic transcriptome of the milkweed bug Oncopeltus fasciatus. BMC Genomics 2011, 12:61. BioMed Central Full Text
  • [35]Sloan DB, Keller SR, Berardi AE, Sanderson BJ, Karpovich JF, Taylor DR: De novo transcriptome assembly and polymorphism detection in the flowering plant Silene vulgaris (Caryophyllaceae). Mol Ecol Resour 2012, 12:333-343.
  • [36]Hampton M, Melvin RG, Kendall AH, Kirkpatrick BR, Peterson N, Andrews MT: Deep Sequencing the Transcriptome Reveals Seasonal Adaptive Mechanisms in a Hibernating Mammal. PLoS ONE 2011, 6:e27021.
  • [37]Babraham Bioinformatics: FastQC A Quality Control tool for High Throughput Sequence Data. [http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ webcite]
  • [38]FASTX-Toolkit. [http://hannonlab.cshl.edu/fastx_toolkit/ webcite]
  • [39]Cahais V, Gayral P, Tsagkogeorga G, Melo-Ferreira J, Ballenghien M, Weinert L, Chiari Y, Belkhir K, Ranwez V, Galtier N: Reference-free transcriptome assembly in non-model animals from next-generation sequencing data: DE NOVO NGS-BASED TRANSCRIPTOME ASSEMBLY. Mol Ecol Resour 2012, 12:834-845.
  • [40]Kumar S, Blaxter ML: Comparing de novo assemblers for 454 transcriptome data. BMC Genomics 2010, 11:571. BioMed Central Full Text
  • [41]Martin J, Bruno VM, Fang Z, Meng X, Blow M, Zhang T, Sherlock G, Snyder M, Wang Z: Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genomics 2010, 11:663. BioMed Central Full Text
  • [42]Hornett EA, Wheat CW: Quantitative RNA-Seq analysis in non-model species: assessing transcriptome assemblies as a scaffold and the utility of evolutionary divergent genomic reference species. BMC Genomics 2012, 13:361. BioMed Central Full Text
  • [43]Mundry M, Bornberg-Bauer E, Sammeth M, Feulner PGD: Evaluating Characteristics of De Novo Assembly Software on 454 Transcriptome Data: A Simulation Approach. PLoS ONE 2012, 7:e31410.
  • [44]Vijay N, Poelstra JW, Künstner A, Wolf JBW: Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments. Mol Ecol 2013, 22:620-634.
  • [45]O’Neil ST, Emrich SJ: Assessing De Novo transcriptome assembly metrics for consistency and utility. BMC Genomics 2013, 14:465. BioMed Central Full Text
  • [46]ABySS 1.3.5: Canada’s Michael Smith Genome Sciences Centre. [http://www.bcgsc.ca/platform/bioinfo/software/abyss/releases/1.3.5 webcite]
  • [47]Henschel R, Lieber M, Wu L-S, Nista PM, Haas BJ, LeDuc RD: Trinity RNA-Seq assembler performance optimization. In Proc 1st Conf Extreme Sci Eng Discov Environ Bridg EXtreme Campus Beyond. Chicago, Illinois: ACM; 2012:1-8.
  • [48]Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 2011, 29:644-652.
  • [49]McQuilton P, St Pierre SE, Thurmond J: FlyBase Consortium: FlyBase 101–the basics of navigating FlyBase. Nucleic Acids Res 2012, 40(Database issue):D706-D714.
  • [50]Bactrocera dorsalis (ID 167923) - BioProject - NCBI. [http://www.ncbi.nlm.nih.gov/sra/?term=366392 webcite]
  • [51]Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer ELL, Eddy SR, Bateman A, Finn RD: The Pfam protein families database. Nucleic Acids Res 2011, 40:D290-D301.
  • [52]Wang X-W, Luan J-B, Li J-M, Bao Y-Y, Zhang C-X, Liu S-S: De novo characterization of a whitefly transcriptome and analysis of its gene expression during development. BMC Genomics 2010, 11:400. BioMed Central Full Text
  • [53]Schwartz TS, Tae H, Yang Y, Mockaitis K, Van Hemert JL, Proulx SR, Choi J-H, Bronikowski AM: A garter snake transcriptome: pyrosequencing, de novo assembly, and sex-specific differences. BMC Genomics 2010, 11:694. BioMed Central Full Text
  • [54]Bao B, Xu W-H: Identification of gene expression changes associated with the initiation of diapause in the brain of the cotton bollworm, Helicoverpa armigera. BMC Genomics 2011, 12:224. BioMed Central Full Text
  • [55]Tweedie S, Ashburner M, Falls K, Leyland P, McQuilton P, Marygold S, Millburn G, Osumi-Sutherland D, Schroeder A, Seal R, Zhang H: The FlyBase Consortium: FlyBase: enhancing Drosophila Gene Ontology annotations. Nucleic Acids Res 2009, 37:D555-D559. (Database)
  • [56]Jourdren L, Bernard M, Dillies M-A, Le Crom S: Eoulsan: a cloud computing-based framework facilitating high throughput sequencing analyses. Bioinforma Oxf Engl 2012, 28:1542-1543.
  • [57]Zhao Q-Y, Wang Y, Kong Y-M, Luo D, Li X, Hao P: Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. BMC Bioinformatics 2011, 12(Suppl 14):S2. BioMed Central Full Text
  • [58]Ang Y, Puniamoorthy N, Meier R: Secondarily reduced foreleg armature in Perochaeta dikowi sp.n. (Diptera: Cyclorrhapha: Sepsidae) due to a novel mounting technique. Syst Entomol 2008, 33:552-559.
  • [59]Concha C, Li F, Scott MJ: Conservation and sex-specific splicing of the doublesex gene in the economically important pest species Lucilia cuprina. J Genet 2010, 89:279-285.
  • [60]DeWoody JA, Abts KC, Fahey AL, Ji Y, Kimble SJA, Marra NJ, Wijayawardena BK, Willoughby JR: Of contigs and quagmires: next-generation sequencing pitfalls associated with transcriptomic studies. Mol Ecol Resour 2013, 13:551-558.
  文献评价指标  
  下载次数:70次 浏览次数:19次