期刊论文详细信息
BMC Bioinformatics
FAVR (Filtering and Annotation of Variants that are Rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets
Daniel J Park3  Melissa C Southey3  Andrew Lonie4  David E Goldgar1  Sean V Tavtigian2  Kayoko Tao2  Russell Bell2  Fleur Hammet3  Fabrice Odefrey3  Tú Nguyen-Dumont3  Bernard J Pope4 
[1]Department of Dermatology, University of Utah School of Medicine, Salt Lake City 8411, USA
[2]Huntsman Cancer Institute and Department of Oncological Sciences, University of Utah School of Medicine, Salt Lake City 84112, USA
[3]Genetic Epidemiology Laboratory, Department of Pathology, Medical Building, The University of Melbourne, Melbourne, Victoria 3010, Australia
[4]Victorian Life Sciences Computation Initiative, The University of Melbourne, 187 Grattan Street Carlton, Melbourne, Victoria 3010, Australia
关键词: FAVR;    Annotation;    Filtering;    Rare genetic variants;    Massively parallel sequencing;   
Others  :  1087969
DOI  :  10.1186/1471-2105-14-65
 received in 2012-04-13, accepted in 2012-12-05,  发布年份 2013
PDF
【 摘 要 】

Background

Characterising genetic diversity through the analysis of massively parallel sequencing (MPS) data offers enormous potential to significantly improve our understanding of the genetic basis for observed phenotypes, including predisposition to and progression of complex human disease. Great challenges remain in resolving genetic variants that are genuine from the millions of artefactual signals.

Results

FAVR is a suite of new methods designed to work with commonly used MPS analysis pipelines to assist in the resolution of some of the issues related to the analysis of the vast amount of resulting data, with a focus on relatively rare genetic variants. To the best of our knowledge, no equivalent method has previously been described. The most important and novel aspect of FAVR is the use of signatures in comparator sequence alignment files during variant filtering, and annotation of variants potentially shared between individuals. The FAVR methods use these signatures to facilitate filtering of (i) platform and/or mapping-specific artefacts, (ii) common genetic variants, and, where relevant, (iii) artefacts derived from imbalanced paired-end sequencing, as well as annotation of genetic variants based on evidence of co-occurrence in individuals. We applied conventional variant calling applied to whole-exome sequencing datasets, produced using both SOLiD and TruSeq chemistries, with or without downstream processing by FAVR methods. We demonstrate a 3-fold smaller rare single nucleotide variant shortlist with no detected reduction in sensitivity. This analysis included Sanger sequencing of rare variant signals not evident in dbSNP131, assessment of known variant signal preservation, and comparison of observed and expected rare variant numbers across a range of first cousin pairs. The principles described herein were applied in our recent publication identifying XRCC2 as a new breast cancer risk gene and have been made publically available as a suite of software tools.

Conclusions

FAVR is a platform-agnostic suite of methods that significantly enhances the analysis of large volumes of sequencing data for the study of rare genetic variants and their influence on phenotypes.

【 授权许可】

   
2013 Pope et al; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117062209449.pdf 729KB PDF download
Figure 4. 66KB Image download
Figure 3. 48KB Image download
Figure 2. 63KB Image download
Figure 1. 83KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

【 参考文献 】
  • [1]Park DJ, Lesueur F, Nguyen-Dumont T: Rare mutations in XRCC2 increase the risk of breast cancer. Am J Hum Genet 2012, 90:734-739.
  • [2]Stratton MR, Rahman N: The emerging landscape of breast cancer susceptibility. Nat Genet 2008, 40:17-22.
  • [3]Yang J, Manolio TA, Pasquale LR: Genome partitioning of genetic variation for complex traits using common SNPs. Nat Genet 2011, 43:519-525.
  • [4]Ng SB, Buckingham KJ, Lee C: Exome sequencing identifies the cause of a mendelian disorder. Nat Genet 2010, 42:30-35.
  • [5]Cirulli ET, Goldstein DB: Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet 2010, 11:415-425.
  • [6]Mokry M, Feitsma H, Nijman IJ: Accurate SNP and mutation detection by targeted custom microarray-based genomic enrichment of short-fragment sequencing libraries. Nucleic Acids Res 2010, 38:e116.
  • [7]Li H, Handsaker B, Wysoker A: The sequence alignment/Map format and SAMtools. Bioinformatics 2009, 25:2078-2079.
  • [8]Danecek P, Auton A, Abecasis G: The variant call format and VCFtools. Bioinformatics 2011, 27:2156-2158.
  • [9]Wang K, Li M, Hakonarson H: ANNOVAR: Functional annotation of genetic variants from next-generation sequencing data. Nucleic Acids Res 2010, 38:e164.
  • [10]DePristo MA, Banks E, Poplin RE: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genetics 2011, 43:491-498.
  文献评价指标  
  下载次数:82次 浏览次数:45次