期刊论文

【摘要】

Background

With the advent of next-generation sequencing there is an increased demand for tools to pre-process and handle the vast amounts of data generated. One recurring problem is adapter contamination in the reads, i.e. the partial or complete sequencing of adapter sequences. These adapter sequences have to be removed as they can hinder correct mapping of the reads and influence SNP calling and other downstream analyses.

Findings

We present a tool called AdapterRemoval which is able to pre-process both single and paired-end data. The program locates and removes adapter residues from the reads, it is able to combine paired reads if they overlap, and it can optionally trim low-quality nucleotides. Furthermore, it can look for adapter sequence in both the 5’ and 3’ ends of the reads. This is a flexible tool that can be tuned to accommodate different experimental settings and sequencing platforms producing FASTQ files. AdapterRemoval is shown to be good at trimming adapters from both single-end and paired-end data.

Conclusions

AdapterRemoval is a comprehensive tool for analyzing next-generation sequencing data. It exhibits good performance both in terms of sensitivity and specificity. AdapterRemoval has already been used in various large projects and it is possible to extend it further to accommodate application-specific biases in the data.

【授权许可】

2012 Lindgreen; licensee BioMed Central Ltd.

【预览】

附件列表
Files	Size	Format	View
20150416041825601.pdf	418KB	PDF	download
Figure 2.	26KB	Image	download
Figure 1.	32KB	Image	download

【图表】

Figure 1.

Figure 2.

【参考文献】

[1]Niedringhaus TP, Milanova D, Kerby MB, Snyder MP, Barron AE: Landscape of next-generation sequencing technologies. Anal Chem 2011, 83(12):4327-4341.
[2]Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009, 10(3):R25. BioMed Central Full Text
[3]Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25:1754-1760.
[4]Li R, Li Y, Kristiansen K, Wang J: SOAP: short oligonucleotide alignment program. Bioinformatics 2008, 24(5):713-714.
[5]Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K, Wang J: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 2009, 25(15):1966-1967.
[6]Lindgreen S: AdapterRemoval. 2012. [ http://code.google.com/p/adapterremoval/ webcite]
[7]Kong Y: Btrim: a fast, lightweight adapter and quality trimming program for next-generation sequencing technologies. Genomics 2011, 98(2):152-153.
[8]Kong Y: Btrim. 2011. [ http://graphics.med.yale.edu/trim/ webcite]
[9]Pandey RV, Nolte V, Schlotterer C: CANGS: a user-friendly utility for processing and analyzing 454 GS-FLX data in biodiversity studies. BMC Res Notes 2010, 3:3. BioMed Central Full Text
[10]Pandey RV, Nolte V, Schlotterer C: CANGS. 2010. [ http://i122server.vu-wien.ac.at/CANGS1.1/ webcite]
[11]Martin M: Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 2011, 17(1):10-12.
[12]Martin M: Cutadapt. 2011. [ http://code.google.com/p/cutadapt/ webcite]
[13]Aronesty E: ea-utils: Command-line tools for processing biological sequencing data. 2011. [ http://code.google.com/p/ea-utils webcite]
[14]Unknown: FAR. [ http://sourceforge.net/projects/theflexibleadap/ webcite]
[15]Gordon A: FASTX-Toolkit. [ http://hannonlab.cshl.edu/fastx_toolkit/ webcite]
[16]Buffalo V: Scythe. [https://github.com/vsbuffalo/scythe]
[17]John JS: SeqPrep. [https://github.com/jstjohn/SeqPrep]
[18]Falgueras J, Lara AJ, Fernandez-Pozo N, Canton FR, Perez-Trabado G, Claros MG: SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read. BMC Bioinformatics 2010, 11:38. BioMed Central Full Text
[19]Falgueras J, Lara AJ, Fernandez-Pozo N, Canton FR, Perez-Trabado G, Claros MG: SeqTrim. 2010. [ http://www.scbi.uma.es/seqtrim webcite]
[20]Schmieder R, Lim YW, Rohwer F, Edwards R: TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets. BMC Bioinformatics 2010, 11:341. BioMed Central Full Text
[21]Schmieder R: TagCleaner. [ http://tagcleaner.sourceforge.net/ webcite]
[22]Lassmann T, Hayashizaki Y, Daub C: TagDust - a program to eliminate artifacts from next generation sequencing data. Bioinformatics 2009, 25(21):2839-2840.
[23]Lassmann T, Hayashizaki Y, Daub C: TagDust. 2009. [ http://genome.gsc.riken.jp/osc/english/software/ webcite]
[24]Krueger F: Trim Galore! [ http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ webcite]
[25]Morgan M, Anders S, Lawrence M, Aboyoun P, Pages H, Gentleman R: ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data. Bioinformatics 2009, 25(19):2607-2608.
[26]Morgan M, Anders S, Lawrence M, Aboyoun P, Pages H, Gentleman R: ShortRead. 2009. [ http://bioconductor.org/packages/release/bioc/html/ShortRead.html webcite]
[27]Bolger A, Giorgi F: Trimmomatic. [ http://www.usadellab.org/cms/index.php?page=trimmomatic/ webcite]
[28]Rasmussen M, Li Y, Lindgreen S, Pedersen JS, Albrechtsen A, Moltke I, Metspalu M, Metspalu E, Kivisild T, Gupta R, Bertalan M, Nielsen K, Gilbert MT, Wang Y, Raghavan M, Campos PF, Kamp HM, Wilson AS, Gledhill A, Tridico S, Bunce M, Lorenzen ED, Binladen J, Guo X, Zhao J, Zhang X, Zhang H, Li Z, Chen M, Orlando L, Kristiansen K, Bak M, Tommerup N, Bendixen C, Pierre TL, Gr?nnow B, Meldgaard M, Andreasen C, Fedorova SA, Osipova LP, Higham TF, Ramsey CB, Hansen TV, Nielsen FC, Crawford MH, Brunak S, Sicheritz-Ponten T, Villems R, Nielsen R, Krogh A, Wang J, Willerslev E: Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 2010, 463:757-762.
[29]Rasmussen M, Guo X, Wang Y, Lohmueller KE, Rasmussen S, Albrechtsen A, Skotte L, Lindgreen S, Metspalu M, Jombart T, Kivisild T, Zhai W, Eriksson A, Manica A, Orlando L, De La Vega FM, Tridico S, Metspalu E, Nielsen K, Avila-Arcos MC, Moreno-Mayar JV, Muller C, Dortch J, Gilbert MT, Lund O, Wesolowska A, Karmin M, Weinert LA, Wang B, Li J, Tai S, Xiao F, Hanihara T, van Driem G, Jha AR, Ricaut FX, de Knijff P, Migliano AB, Gallego Romero I, Kristiansen K, Lambert DM, Brunak S, Forster P, Brinkmann B, Nehlich O, Bunce M, Richards M, Gupta R, Bustamante CD, Krogh A, Foley RA, Lahr MM, Balloux F, Sicheritz-Ponten T, Villems R, Nielsen R, Wang J, Willerslev E: An Aboriginal Australian genome reveals separate human dispersals into Asia. Science 2011, 334:94-98.
[30]Orlando L, Ginolhac A, Raghavan M, Vilstrup J, Rasmussen M, Magnussen K, Steinmann KE, Kapranov P, Thompson JF, Zazula G, Froese D, Moltke I, Shapiro B, Hofreiter M, Al-Rasheid KA, Gilbert MT, Willerslev E: True single-molecule DNA sequencing of a pleistocene horse bone. Genome Res 2011, 21:1705-1719.
[31]Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970, 48(3):443-453.
[32]Minoche AE, Dohm JC, Himmelbauer H: Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol 2011, 12(11):R112. BioMed Central Full Text
[33]Magoc̆ T, Salzberg SL: FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 2011, 27:2957-2963.
[34]Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 1998, 8:186-194.
[35]Reich D, Green RE, Kircher M, Krause J, Patterson N, Durand EY, Viola B, Briggs AW, Stenzel U, Johnson PL, Maricic T, Good JM, Marques-Bonet T, Alkan C, Fu Q, Mallick S, Li H, Meyer M, Eichler EE, Stoneking M, Richards M, Talamo S, Shunkov MV, Derevianko AP, Hublin JJ, Kelso J, Slatkin M, Paabo S: Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 2010, 468:1053-1060.

BMC Research Notes
AdapterRemoval: easy cleaning of next-generation sequencing reads

Stinus Lindgreen¹
[1] Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen K, Denmark
关键词: Single-end reads; Paired-end reads; Sequence alignment; Data pre-processing; Adapter trimming; Next-generation sequencing;
Others : 1166194 DOI : 10.1186/1756-0500-5-337

received in 2012-03-28, accepted in 2012-06-19, 发布年份 2012
PDF


	文献评价指标
	下载次数：28次	浏览次数：4次

【 摘 要 】

Background

Findings

Conclusions

【 授权许可】

【 预 览 】

【 图 表 】

【 参考文献 】

【摘要】

【授权许可】

【预览】

【图表】

【参考文献】