期刊论文详细信息
BMC Bioinformatics
SMRT sequencing only de novo assembly of the sugar beet (Beta vulgaris) chloroplast genome
Daniela Holtgräwe2  Bernd Weisshaar2  Kai Bernd Stadermann1 
[1]Bioinformatics Resource Facility, Centre for Biotechnology, Bielefeld University, Bielefeld, Germany
[2]Chair of Genome Research, Faculty of Biology, Bielefeld University, Bielefeld, Germany
关键词: Sprai;    Chloroplast;    Sugar beet;    SMRT sequencing;    PacBio;    Assembly;   
Others  :  1229457
DOI  :  10.1186/s12859-015-0726-6
 received in 2015-04-24, accepted in 2015-09-06,  发布年份 2015
PDF
【 摘 要 】

Background

Third generation sequencing methods, like SMRT (Single Molecule, Real-Time) sequencing developed by Pacific Biosciences, offer much longer read length in comparison to Next Generation Sequencing (NGS) methods. Hence, they are well suited for de novo- or re-sequencing projects. Sequences generated for these purposes will not only contain reads originating from the nuclear genome, but also a significant amount of reads originating from the organelles of the target organism. These reads are usually discarded but they can also be used for an assembly of organellar replicons. The long read length supports resolution of repetitive regions and repeats within the organelles genome which might be problematic when just using short read data. Additionally, SMRT sequencing is less influenced by GC rich areas and by long stretches of the same base.

Results

We describe a workflow for a de novo assembly of the sugar beet (Beta vulgaris ssp. vulgaris) chloroplast genome sequence only based on data originating from a SMRT sequencing dataset targeted on its nuclear genome. We show that the data obtained from such an experiment are sufficient to create a high quality assembly with a higher reliability than assemblies derived from e.g. Illumina reads only. The chloroplast genome is especially challenging for de novo assembling as it contains two large inverted repeat (IR) regions. We also describe some limitations that still apply even though long reads are used for the assembly.

Conclusions

SMRT sequencing reads extracted from a dataset created for nuclear genome (re)sequencing can be used to obtain a high quality de novo assembly of the chloroplast of the sequenced organism. Even with a relatively small overall coverage for the nuclear genome it is possible to collect more than enough reads to generate a high quality assembly that outperforms short read based assemblies. However, even with long reads it is not always possible to clarify the order of elements of a chloroplast genome sequence reliantly which we could demonstrate with Fosmid End Sequences (FES) generated with Sanger technology. Nevertheless, this limitation also applies to short read sequencing data but is reached in this case at a much earlier stage during finishing.

【 授权许可】

   
2015 Stadermann et al.

【 预 览 】
附件列表
Files Size Format View
20151030014712535.pdf 1208KB PDF download
Fig. 6. 65KB Image download
Fig. 5. 56KB Image download
Fig. 4. 26KB Image download
Fig. 3. 28KB Image download
Fig. 2. 56KB Image download
Fig. 1. 24KB Image download
【 图 表 】

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

【 参考文献 】
  • [1]Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR et al.. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008; 456:53-59.
  • [2]Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen Y-J, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Ho CH, Irzyk GP, Jando SC, Alenquer MLI, Jarvie TP, Jirage KB, Kim J-B, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M et al.. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005; 437:376-380.
  • [3]Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A. 1977; 74:5463-5467.
  • [4]Liu L, Li Y, Li S, Hu N, He Y, Pong R, Lin D, Lu L, Law M. Comparison of Next-Generation Sequencing Systems. J Biomed Biotechnol. 2012; 2012:251364.
  • [5]Li F, Fan G, Lu C, Xiao G, Zou C, Kohel RJ, Ma Z, Shang H, Ma X, Wu J, Liang X, Huang G, Percy RG, Liu K, Yang W, Chen W, Du X, Shi C, Yuan Y, Ye W, Liu X, Zhang X, Liu W, Wei H, Wei S, Huang G, Zhang X, Zhu S, Zhang H, Sun F et al.. Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat Biotechnol. 2015; 33:524-530.
  • [6]Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, Dewinter A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S et al.. Real-Time DNA Sequencing from Single Polymerase Molecules. Science. 2009; 323:133-138.
  • [7]Pacific Biosciences. New Chemistry Boosts Average Read Length to 10–15 kb for PacBio RS II. http://blog.pacificbiosciences.com/2014/10/new-chemistry-boosts-average-read.html (2014). Accessed July 22, 2015.
  • [8]Korlach J. Understanding Accuracy in SMRT Sequencing. http://www.pacb.com/pdf/Perspective_UnderstandingAccuracySMRTSequencing.pdf (2013). Accessed January 29, 2015.
  • [9]Ross MG, Russ C, Costello M, Hollinger A, Lennon NJ, Hegarty R, Nusbaum C, Jaffe DB. Characterizing and measuring bias in sequence data. Genome Biol. 2013; 14:R51. BioMed Central Full Text
  • [10]Roxanne Steele P, Hertweck KL, Mayfield D, McKain MR, Leebens-Mack J, Chris Pires J. Quality and quantity of data recovered from massively parallel sequencing: Examples in asparagales and Poaceae. Am J Bot. 2012; 99:330-348.
  • [11]Straub SCK, Parks M, Weitemier K, Fishbein M, Cronn RC, Liston A. Navigating the tip of the genomic iceberg: Next-generation sequencing for plant systematics. Am J Bot. 2012; 99:349-364.
  • [12]Ferrarini M, Moretto M, Ward JA, Surbanovski N, Stevanović V, Giongo L, Viola R, Cavalieri D, Velasco R, Cestaro A, Sargent DJ. An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome. BMC Genomics. 2013; 14:670. BioMed Central Full Text
  • [13]Li Q, Li Y, Song J, Xu H, Xu J, Zhu Y, Li X, Gao H, Dong L, Qian J, Sun C, Chen S. High-accuracy de novo assembly and SNP detection of chloroplast genomes using a SMRT circular consensus sequencing strategy. New Phytol. 2014; 204:1041-1049.
  • [14]Wu Z, Gui S, Quan Z, Pan L, Wang S, Ke W, Liang D, Ding Y. A precise chloroplast genome of Nelumbo nucifera (Nelumbonaceae) evaluated with Sanger, Illumina MiSeq, and PacBio RS II sequencing platforms: insight into the plastid evolution of basal eudicots. BMC Plant Biol. 2014; 14:289. BioMed Central Full Text
  • [15]Chen X, Li Q, Li Y, Qian J, Han J. Chloroplast genome of Aconitum barbatum var. puberulum (Ranunculaceae) derived from CCS reads using the PacBio RS platform. Front Plant Sci. 2015; 6:42.
  • [16]Li H, Cao H, Cai Y-F, Wang J-H, Qu S-P, Huang X-Q. The complete chloroplast genome sequence of sugar beet (Beta vulgaris ssp. vulgaris). Mitochondrial DNA. 2014; 25:209-211.
  • [17]Dohm JC, Minoche AEAE, Holtgräwe D, Capella-Gutiérrez S, Zakrzewski F, Tafer H, Rupp O, Sörensen TR, Stracke R, Reinhardt R, Goesmann A, Kraft T, Schulz B, Stadler PF, Schmidt T, Gabaldón T, Lehrach H, Weisshaar B, Himmelbauer H, Holtgrawe D, Capella-Gutierrez S, Zakrzewski F, Tafer H, Rupp O, Sorensen TR, Stracke R, Reinhardt R, Goesmann A, Kraft T, Schulz B et al.. The genome of the recently domesticated crop plant sugar beet (Beta vulgaris). Nature. 2014; 505:546-549.
  • [18]Carlson JE, Tulsieram LK, Glaubitz JC, Luk VWK, Kauffeldt C, Rutledge R. Segregation of random amplified DNA markers in F1 progeny of conifers. Theor Appl Genet. 1991; 83:194-200.
  • [19]SMRT-Analysis. https://github.com/PacificBiosciences/SMRT-Analysis (2014). Accessed January 29, 2015.
  • [20]Schmitz-Linneweber C, Maier RM, Alcaraz JP, Cottet A, Herrmann RG, Mache R. The plastid chromosome of spinach (Spinacia oleracea): complete nucleotide sequence and gene organization. Plant Mol Biol. 2001; 45:307-315.
  • [21]Chaisson M, Tesler G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics. 2012; 13:238. BioMed Central Full Text
  • [22]Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009; 25:2078-2079.
  • [23]Imai T: sprai = single pass read accuracy improver. http://zombie.cb.k.u-tokyo.ac.jp/sprai/index.html (2014). Accessed January 29, 2015.
  • [24]Miyamoto M, Motooka D, Gotoh K, Imai T, Yoshitake K, Goto N, Iida T, Yasunaga T, Horii T, Arakawa K, Kasahara M, Nakamura S. Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes. BMC Genomics. 2014; 15:699. BioMed Central Full Text
  • [25]Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, Kravitz SA, Mobarry CM, Reinert KHJ, Remington KA, Anson EL, Bolanos RA, Chou H-H, Jordan CM, Halpern AL, Lonardi S, Beasley EM, Brandon RC, Chen L, Dunn PJ, Lai Z, Liang Y, Nusskern DR, Zhan M, Zhang Q, Zheng X, Rubin GM, Adams MD, Venter JC. A Whole-Genome Assembly of Drosophila. Science. 2000; 287:2196-2204.
  • [26]Lange C, Holtgräwe D, Schulz B, Weisshaar B, Himmelbauer H. Construction and characterization of a sugar beet (Beta vulgaris) fosmid library. Genome. 2008; 51:948-951.
  • [27]Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012; 9:357-359.
  • [28]Chin C-S, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE, Turner SW, Korlach J. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods. 2013; 10:563-569.
  • [29]Guo X, Ruan S, Hu W, Cai D, Fan L. Chloroplast DNA insertions into the nuclear genome of rice: the genes, sites and ages of insertion involved. Funct Integr Genomics. 2008; 8:101-108.
  • [30]Liu C, Shi L, Zhu Y, Chen H, Zhang J, Lin X, Guan X. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences. BMC Genomics. 2012; 13:715. BioMed Central Full Text
  • [31]Longden I. EMBOSS stretcher. http://emboss.sourceforge.net/apps/release/6.6/emboss/apps/stretcher.html (1999). Accessed January 29, 2015.
  • [32]Altschul S, Gish W, Miller W. Basic Local Alignment Search Tool. J Mol Biol. 1990; 215:403-410.
  • [33]Kudla J, Igloi GL, Metzlaff M, Hagemann R, Kössel H. RNA editing in tobacco chloroplasts leads to the formation of a translatable psbL mRNA by a C to U substitution within the initiation codon. EMBO J. 1992; 11:1099-1103.
  • [34]Neckermann K, Zeltz P, Igloi GL, Kössel H, Maier RM. The role of RNA editing in conservation of start codons in chloroplast genomes. Gene. 1994; 146:177-182.
  文献评价指标  
  下载次数:57次 浏览次数:25次