期刊论文详细信息
BMC Research Notes
Challenges in exome analysis by LifeScope and its alternative computational pipelines
Vaidutis Kučinskas1  Aidas Pranculis1  Tautvydas Rančelis1  Erinija Pranckevičiene1 
[1] Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu str. 2, Vilnius, LT-08661, Lithuania
关键词: Interpretation of genomic variants;    Annovar;    BFAST;    SHRiMP;    MAQ;    Mapping of color-space sequencing data;    GATK;    Exome analysis pipeline;    LifeScope;   
Others  :  1230309
DOI  :  10.1186/s13104-015-1385-4
 received in 2015-01-24, accepted in 2015-08-24,  发布年份 2015
PDF
【 摘 要 】

Background

Every next generation sequencing (NGS) platform relies on proprietary and open source computational tools to analyze sequencing data. NGS tools for Illumina platforms are well documented which is not the case with AB SOLiD systems. We applied several computational and variant calling pipelines to analyse targeted exome sequencing data obtained using AB SOLiD 5500 system. Our investigated tools comprised proprietary LifeScope’s pipeline in combination with open source color-space competent mapping programs and a variant caller. We present instrumental details of the pipelines that were used and quantitative comparative analysis of variant lists generated by LifeScope’s pipeline versus open source tools.

Results

Sufficient coverage of targeted regions was achieved by all investigated pipelines. High variability was observed in identities of variants across the mapping programs. We observed less than 50 % concordance of variant lists produced by approaches based on different mapping algorithms. We summarized different approaches with regards to coverage (DP) and quality (QUAL) properties of the variants provided by GATK and found that LifeScope’s computational pipeline is superior. Fusion of information on mapping profiles (pileup) at genomic positions of variants in several different alignments proved to be a useful strategy to assess questionable singleton variants.

Conclusions

We quantitatively supported a conclusion that Lifescope’s pipeline is superior for processing sequencing data obtained by AB SOLiD 5500 system. Nevertheless the use of alternative pipelines is encouraged because aggregation of information from other mapping and variant calling approaches helps to resolve questionable calls and increases the confidence of the call. It was noted that a coverage threshold for variant to be considered for further analysis has to be chosen in data-driven way to prevent a loss of important information.

【 授权许可】

   
2015 Pranckevičiene et al.

【 预 览 】
附件列表
Files Size Format View
20151106022103224.pdf 2000KB PDF download
Fig. 5. 24KB Image download
Fig. 4. 140KB Image download
Fig. 3. 88KB Image download
Fig. 2. 17KB Image download
Fig. 1. 30KB Image download
【 图 表 】

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

【 参考文献 】
  • [1]Coonrod EM, Durtschi JD, Margraf RL, Voelkerding KV: Developing genome and exome sequencing for candidate gene identification in inherited disorders: an integrated technical and bioinformatics approach. Arch Pathol Lab Med. 2013, 137(3):415-33.
  • [2]Kiezun A, Garimella K, Do R, Stitziel NO, Neale BM, McLaren PJ, Gupta N, Sklar P, Sullivan PF, Moran JL, Hultman CM, Lichtenstein P, Magnusson P, Lehner T, Shugart YY, Price AL, de Bakker PI, Purcell SM, Sunyaev SR: Exome sequencing and the genetic basis of complex traits. Nat Genet. 2012, 44(6):623-30.
  • [3]D’Antonio M, D’Onorio De Meo P, Paoletti D, Elmi B, Pallocca M, Sanna N, Picardi E, Pesole G, Castrignano T: WEP: a high-performance analysis pipeline for whole-exome data. BMC Bioinform 2013, 14(Suppl 7):S11. BioMed Central Full Text
  • [4]Fischer M, Snajder R, Pabinger S, Dander A, Schossig A, Zschocke J, Trajanoski Z, Stocker G: SIMPLEX: cloud-enabled pipeline for the comprehensive analysis of exome sequencing data. PLoS One. 2012, 7(8):e41948.
  • [5]Brownstein CA, Beggs AH, Homer N, Merriman B, Yu TW, Flannery KC, DeChene ET, Towne MC, Savage SK, Price EN, Holm IA, Luquette LJ, Lyon E, Majzoub J, Neupert P, McCallie D Jr, Szolovits P, Willard HF, Mendelsohn NJ, Temme R, Finkel RS, Yum SW, Medne L, Sunyaev SR, Adzhubey I, Cassa CA, de Bakker PI, Duzkale H, Dworzyński P, Fairbrother W, et al.: An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge. Genome Biol. 2014, 15(3):R53. BioMed Central Full Text
  • [6]Lampa S, Dahlo M, Olason PI, Hagberg J, Spjuth O: Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data. Gigascience. 2013, 2(1):9. BioMed Central Full Text
  • [7]Mazza T, Castellana S: Multi-Sided compression performance assessment of ABI SOLiD WES data. Algorithms. 2013, 6:309-18.
  • [8]Homer N, Merriman B, Nelson SF: Local alignment of two-base encoded DNA sequence. BMC Bioinform. 2009, 10:175. BioMed Central Full Text
  • [9]David M, Dzamba M, Lister D, Ilie L, Brudno M: SHRiMP2: sensitive yet practical short read mapping. Bioinformatics. 2011, 27(7):1011-1012.
  • [10]Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z: A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform. 2014, 15(2):256-278.
  • [11]Fonseca NA, Rung J, Brazma A, Marioni JC: Tools for mapping high-throughput sequencing data. Bioinformatics. 2012, 28(24):3169-77.
  • [12]Castellana S, Romani M, Valente EM, Mazza T: A solid quality-control analysis of AB SOLiD short-read sequencing data. Brief Bioinform. 2013, 14(6):684-695.
  • [13]LifeScopeTM. Users manual, Lifescope Genomic Analysis Software 2.5.1, Command Shell, DATA ANALYSIS METHODS AND INTERPRETATION. Publication Part Number 4476538 Rev. A. 2012.
  • [14]Rancelis T, Cimbalistiene L, Kucinskas V: Next-generation whole-exome sequencing contribution to identification of rare autosomal recessive diseases. Acta Medica Lith. 2013, 20(1):43-51.
  • [15]Wang K, Li M, Hakonarson H: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010, 38(16):e164.
  • [16]Rancelis T, Pranckeviciene E, Kucinskas V. Annotation tools and computer programs for genome/exome data analysis. Lab Med t.15 (in Lithuanian). 2013;4(60):206–12.
  • [17]DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011, 43(5):491-98.
  • [18]McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010, 20(9):1297-1303.
  • [19]de Ligt J, Boone PM, Pfundt R, Vissers LELM, de Leeuw N, Shaw C, Brunner HG, Lupski JR, Veltman JA, Hehir-Kwa JY: Platform comparison of detecting copy number variants with microarrays and whole-exome sequencing. Genom Data. 2014, 2:144-146.
  • [20]Gregor A, Oti M, Kouwenhoven EN, Hoyer J, Sticht H, Ekici AB, Kjaergaard S, Rauch A, Stunnenberg HG, Uebe S, Vasileiou G, Reis A, Zhou H, Zweier C: De novo mutations in the genome organizer CTCF cause intellectual disability. Am J Hum Genet. 2013, 93(1):124-131.
  • [21]Ebersberger I, Metzler D, Schwarz C, Paabo S: Genomewide comparison of DNA sequences between humans and chimpanzees. Am J Hum Genet. 2002, 70(6):1490-1497.
  • [22]Guo Y, Long J, He J, Li CI, Cai Q, Shu XO, Zheng W, Li C: Exome sequencing generates high quality data in non-target regions. BMC Genom. 2012, 13:194. BioMed Central Full Text
  • [23]Quinlan AR: BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc Bioinform. 2014, 47:1-11.
  • [24]O’Rawe J, Jiang T, Sun G, Wu Y, Wang W, Hu J, Bodily P, Tian L, Hakonarson H, Johnson WE, Wei Z, Wang K, Lyon GJ: Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med. 2013, 5(3):28. BioMed Central Full Text
  • [25]Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR: ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014, 42:D980-985.
  • [26]Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, Ding M, Bamford S, Cole C, Ward S, Kok CY, Jia M, De T, Teague JW, Stratton MR, McDermott U, Campbell PJ: COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. 2015, 43:D805-811.
  • [27]Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAM tools. Bioinformatics. 2009, 25(16):2078-79.
  • [28]Quinlan R: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco; 1993.
  • [29]Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008, 18(11):1851-58.
  • [30]Homer N, Merriman B, Nelson SF: BFAST: an alignment tool for large scale genome resequencing. PLoS One. 2009, 4(11):e7767.
  • [31]Hatem A, Bozda D, Toland AE, Catalyurek UV: Benchmarking short sequence mapping tools. BMC Bioinform. 2013, 14:184. BioMed Central Full Text
  • [32]Li H, Homer N: A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinform. 2010, 11(2):473-483.
  • [33]Lesk AM: Introduction to bioinformatics. Oxford University Press, New York; 2002.
  文献评价指标  
  下载次数:0次 浏览次数:6次