期刊论文详细信息
BMC Genomics
EFIN: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome
Wanling Yang1  Yu Lung Lau1  Brian Hon-Yin Chung1  Jing Yang1  Shuai Zeng1 
[1] Department of Paediatrics and Adolescent Medicine, LKS Faculty of Medicine, The University of Hong Kong, 5 Sassoon Road, Hong Kong, China
关键词: Evolutionary distance;    Functional impact;    Amino acid conservation;    nsSNP;    Coding mutation;   
Others  :  1216626
DOI  :  10.1186/1471-2164-15-455
 received in 2013-09-15, accepted in 2014-06-04,  发布年份 2014
PDF
【 摘 要 】

Background

Predicting the functional impact of amino acid substitutions (AAS) caused by nonsynonymous single nucleotide polymorphisms (nsSNPs) is becoming increasingly important as more and more novel variants are being discovered. Bioinformatics analysis is essential to predict potentially causal or contributing AAS to human diseases for further analysis, as for each genome, thousands of rare or private AAS exist and only a very small number of which are related to an underlying disease. Existing algorithms in this field still have high false prediction rate and novel development is needed to take full advantage of vast amount of genomic data.

Results

Here we report a novel algorithm that features two innovative changes: 1. making better use of sequence conservation information by grouping the homologous protein sequences into six blocks according to evolutionary distances to human and evaluating sequence conservation in each block independently, and 2. including as many such homologous sequences as possible in analyses. Random forests are used to evaluate sequence conservation in each block and to predict potential impact of an AAS on protein function. Testing of this algorithm on a comprehensive dataset showed significant improvement on prediction accuracy upon currently widely-used programs. The algorithm and a web-based application tool implementing it, EFIN (Evaluation of Functional Impact of Nonsynonymous SNPs) were made freely available (http://paed.hku.hk/efin/ webcite) to the public.

Conclusions

Grouping homologous sequences into different blocks according to the evolutionary distance of the species to human and evaluating sequence conservation in each group independently significantly improved prediction accuracy. This approach may help us better understand the roles of genetic variants in human disease and health.

【 授权许可】

   
2014 Zeng et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150701164957357.pdf 576KB PDF download
Figure 5. 84KB Image download
Figure 4. 61KB Image download
Figure 3. 76KB Image download
Figure 2. 47KB Image download
Figure 1. 95KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

【 参考文献 】
  • [1]Cargill M, Altshuler D, Ireland J, Sklar P, Ardlie K, Patil N, Shaw N, Lane CR, Lim EP, Kalyanaraman N, Nemesh J, Ziaugra L, Friedland L, Rolfe A, Warrington J, Lipshutz R, Daley GQ, Lander ES: Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat Genet 1999, 22(3):231-238.
  • [2]Ng PC, Henikoff S: Predicting deleterious amino acid substitutions. Genome Res 2001, 11(5):863-874.
  • [3]Ng PC, Henikoff S: SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res 2003, 31(13):3812-3814.
  • [4]Cooper GM, Stone EA, Asimenos G, Green ED, Batzoglou S, Sidow A: Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 2005, 15(7):901-913.
  • [5]Stone EA, Sidow A: Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Res 2005, 15(7):978-986.
  • [6]Ng PC, Henikoff S: Predicting the effects of amino acid substitutions on protein function. Annu Rev Genom Hum Genet 2006, 7:61-80.
  • [7]Lee TC, Lee ASG, Li KB: Incorporating the amino acid properties to predict the significance of missense mutations. Amino Acids 2008, 35(3):615-626.
  • [8]Kumar P, Henikoff S, Ng PC: Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 2009, 4(7):1073-1082.
  • [9]Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR: A method and server for predicting damaging missense mutations. Nat Methods 2010, 7(4):248-249.
  • [10]Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S: Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput Biol 2010, 6(12):e1001025.
  • [11]Schwarz JM, Rodelsperger C, Schuelke M, Seelow D: MutationTaster evaluates disease-causing potential of sequence alterations. Nat Methods 2010, 7(8):575-576.
  • [12]Gonzalez-Perez A, Lopez-Bigas N: Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am J Hum Genet 2011, 88(4):440-449.
  • [13]Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215(3):403-410.
  • [14]Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17):3389-3402.
  • [15]Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Mizrachi I, Ostell J, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Souvorov A, Starchenko G, Tatusova TA, et al.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2009, 37:D5-D15.
  • [16]Capra JA, Singh M: Predicting functionally important residues from sequence conservation. Bioinformatics 2007, 23(15):1875-1882.
  • [17]Shannon CE, Weaver W, Blahut RE, Hajek B: The Mathematical Theory of Communication, vol. 117. Urbana: University of Illinois press; 1949.
  • [18]Breiman L: Random forests. Mach Learn 2001, 45(1):5-32.
  • [19]Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A: The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants. Hum Mutat 2004, 23(5):464-470.
  • [20]Liu X, Jian X, Boerwinkle E: dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat 2011, 32(8):894-899.
  • [21]Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A: Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 2010, 20(1):110-121.
  • [22]Benton MJ, Donoghue PCJ: Paleontological evidence to date the tree of life. Mol Biol Evol 2007, 24(1):26-53.
  • [23]Miller MP, Kumar S: Understanding human disease mutations through the use of interspecific genetic variation. Hum Mol Genet 2001, 10(21):2319-2328.
  • [24]Ramensky V, Bork P, Sunyaev S: Human non-synonymous SNPs: server and survey. Nucleic Acids Res 2002, 30(17):3894-3900.
  • [25]Krawczak M, Cooper DN: The human gene mutation database. Trends Genet 1997, 13(3):121-122.
  • [26]Li MX, Gui HS, Kwan JS, Bao SY, Sham PC: A comprehensive framework for prioritizing variants in exome sequencing studies of Mendelian diseases. Nucleic Acids Res 2012, 40(7):e53.
  • [27]Wang K, Li M, Hakonarson H: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 2010, 38(16):e164.
  • [28]Zhang L, Zhang J, Yang J, Ying D, Lau YL, Yang W: PriVar: a toolkit for prioritizing SNVs and indels from next-generation sequencing data. Bioinformatics 2013, 29(1):124-125.
  文献评价指标  
  下载次数:36次 浏览次数:17次