期刊论文详细信息
BioData Mining
An association-adjusted consensus deleterious scheme to classify homozygous Mis-sense mutations for personal genome interpretation
Thanawadee Preeprem1  Greg Gibson1 
[1] School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
关键词: Protein structure analysis;    Variant prioritization;    Personal genome interpretation;    Non-synonymous single nucleotide polymorphism;    Homozygous variant;   
Others  :  797130
DOI  :  10.1186/1756-0381-6-24
 received in 2013-06-08, accepted in 2013-12-17,  发布年份 2013
PDF
【 摘 要 】

Background

Personal genome analysis is now being considered for evaluation of disease risk in healthy individuals, utilizing both rare and common variants. Multiple scores have been developed to predict the deleteriousness of amino acid substitutions, using information on the allele frequencies, level of evolutionary conservation, and averaged structural evidence. However, agreement among these scores is limited and they likely over-estimate the fraction of the genome that is deleterious.

Method

This study proposes an integrative approach to identify a subset of homozygous non-synonymous single nucleotide polymorphisms (nsSNPs). An 8-level classification scheme is constructed from the presence/absence of deleterious predictions combined with evidence of association with disease or complex traits. Detailed literature searches and structural validations are then performed for a subset of homozygous 826 mis-sense mutations in 575 proteins found in the genomes of 12 healthy adults.

Results

Implementation of the Association-Adjusted Consensus Deleterious Scheme (AACDS) classifies 11% of all predicted highly deleterious homozygous variants as most likely to influence disease risk. The number of such variants per genome ranges from 0 to 8 with no significant difference between African and Caucasian Americans. Detailed analysis of mutations affecting the APOE, MTMR2, THSB1, CHIA, αMyHC, and AMY2A proteins shows how the protein structure is likely to be disrupted, even though the associated phenotypes have not been documented in the corresponding individuals.

Conclusions

The classification system for homozygous nsSNPs provides an opportunity to systematically rank nsSNPs based on suggestive evidence from annotations and sequence-based predictions. The ranking scheme, in-depth literature searches, and structural validations of highly prioritized mis-sense mutations compliment traditional sequence-based approaches and should have particular utility for the development of individualized health profiles. An online tool reporting the AACDS score for any variant is provided at the authors’ website.

【 授权许可】

   
2013 Preeprem and Gibson; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140706040319743.pdf 1204KB PDF download
Figure 5. 137KB Image download
Figure 4. 62KB Image download
Figure 3. 83KB Image download
Figure 2. 59KB Image download
Figure 1. 72KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

【 参考文献 】
  • [1]Dewey FE, Pan S, Wheeler MT, Quake SR, Ashley EA: DNA sequencing: clinical applications of new DNA sequencing technologies. Circulation 2012, 125:931-944.
  • [2]Cooper DN, Chen JM, Ball EV, Howells K, Mort M, Phillips AD, Chuzhanova N, Krawczak M, Kehrer-Sawatzki H, Stenson PD: Genes, mutations, and human inherited disease at the dawn of the age of personalized genomics. Hum Mutat 2010, 31:631-655.
  • [3]Ng PC, Henikoff S: Predicting the effects of amino acid substitutions on protein function. Ann Rev Genomics Human Genet 2006, 7:61-80.
  • [4]Ng PC, Levy S, Huang J, Stockwell TB, Walenz BP, Li K, Axelrod N, Busam DA, Strausberg RL, Venter JC: Genetic variation in an individual human exome. PLoS Genet 2008, 4:e1000160.
  • [5]Chun S, Fay JC: Identification of deleterious mutations within three human genomes. Genome Res 2009, 19:1553-1561.
  • [6]Xue Y, Chen Y, Ayub Q, Huang N, Ball EV, Mort M, Phillips AD, Shaw K, Stenson PD, Cooper DN, et al.: Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing. Am J Hum Genet 2012, 91:1022-1032.
  • [7]Patel CJ, Sivadas A, Tabassum R, Preeprem T, Zhao J, Arafat D, Chen R, Morgan A, Martin GS, Brigham KL, et al.: Whole genome sequencing in support of wellness and health maintenance. Genome Medin press
  • [8]Habegger L, Balasubramanian S, Chen DZ, Khurana E, Sboner A, Harmanci A, Rozowsky J, Clarke D, Snyder M, Gerstein M: VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment. Bioinformatics 2012, 28:2267-2269.
  • [9]Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G, Martin D, Merkel A, Knowles DG, et al.: The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 2012, 22:1775-1789.
  • [10]Liu X, Jian X, Boerwinkle E: dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat 2011, 32:894-899.
  • [11]Database of single nucleotide polymorphisms (dbSNP Build ID: 137) http://www.ncbi.nlm.nih.gov/SNP/ webcite
  • [12]Exome variant server http://evs.gs.washington.edu/EVS/ webcite
  • [13]Consortium UP: Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 2012, 40:D71-D75.
  • [14]Luu TD, Rusu AM, Walter V, Ripp R, Moulinier L, Muller J, Toursel T, Thompson JD, Poch O, Nguyen H: MSV3d: database of human MisSense Variants mapped to 3D protein structure. Database 2012, 2012:bas018.
  • [15]Mottaz A, David FP, Veuthey AL, Yip YL: Easy retrieval of single amino-acid polymorphisms and phenotype information using SwissVar. Bioinformatics 2010, 26:851-852.
  • [16]Online Mendelian Inheritance in Man, OMIM® http://omim.org/ webcite
  • [17]A Catalog of published genome-wide association studies http://www.genome.gov/gwastudies webcite
  • [18]Kumar P, Henikoff S, Ng PC: Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Proto 2009, 4:1073-1081.
  • [19]Reva B, Antipin Y, Sander C: Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res 2011, 39:e118.
  • [20]Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR: A method and server for predicting damaging missense mutations. Nat Met 2010, 7:248-249.
  • [21]Schwarz JM, Rodelsperger C, Schuelke M, Seelow D: MutationTaster evaluates disease-causing potential of sequence alterations. Nat Met 2010, 7:575-576.
  • [22]Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S: Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput Biol 2010, 6:e1001025.
  • [23]Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A: Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 2010, 20:110-121.
  • [24]Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ, Washietl S, Kheradpour P, Ernst J, Jordan G, Mauceli E, et al.: A high-resolution map of human evolutionary constraint using 29 mammals. Nature 2011, 478:476-482.
  • [25]Grantham R: Amino acid difference formula to help explain protein evolution. Science 1974, 185:862-864.
  • [26]Wang K, Li M, Hakonarson H: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 2010, 38:e164.
  • [27]Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM: A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 2012, 6:80-92.
  • [28]Xie L, Bourne PE: Functional coverage of the human genome by existing structures, structural genomics targets, and homology models. PLoS Comput Biol 2005, 1:e31.
  • [29]Bernstein FC, Koetzle TF, Williams GJ, Meyer EF Jr, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M: The protein data bank: a computer-based archival file for macromolecular structures. J Mol Biol 1977, 112:535-542.
  • [30]Haas J, Roth S, Arnold K, Kiefer F, Schmidt T, Bordoli L, Schwede T: The protein model portal--a comprehensive resource for protein structure and model information. Database (Oxford) 2013, 2013:bat031.
  • [31]Krieger E, Joo K, Lee J, Lee J, Raman S, Thompson J, Tyka M, Baker D, Karplus K: Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: four approaches that performed well in CASP8. Proteins 2009, 77(Suppl 9):114-122.
  • [32]Benkert P, Schwede T, Tosatto SC: QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information. BMC Struct Biol 2009, 9:35. BioMed Central Full Text
  • [33]McGuffin LJ, Buenavista MT, Roche DB: The ModFOLD4 server for the quality assessment of 3D protein models. Nucleic Acids Res 2013. Epub 2013 Apr 25
  • [34]Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE: UCSF Chimera–a visualization system for exploratory research and analysis. J Comput Chem 2004, 25:1605-1612.
  • [35]Worth CL, Preissner R, Blundell TL: SDM–a server for predicting effects of mutations on protein stability and malfunction. Nucleic Acids Res 2011, 39:W215-W222.
  • [36]Smith RE, Lovell SC, Burke DF, Montalvao RW, Blundell TL: Andante: reducing side-chain rotamer search space during comparative modeling using environment-specific substitution probabilities. Bioinformatics 2007, 23:1099-1105.
  • [37]Capriotti E, Fariselli P, Casadio R: I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res 2005, 33:W306-W310.
  • [38]Dosztanyi Z, Fiser A, Simon I: Stabilization centers in proteins: identification, characterization and predictions. J Mol Biol 1997, 272:597-612.
  • [39]Dosztanyi Z, Magyar C, Tusnady G, Simon I: SCide: identification of stabilization centers in proteins. Bioinformatics 2003, 19:899-900.
  • [40]Magyar C, Gromiha MM, Pujadas G, Tusnady GE, Simon I: SRide: a server for identifying stabilizing residues in proteins. Nucleic Acids Res 2005, 33:W303-W305.
  • [41]Wass MN, Kelley LA, Sternberg MJ: 3DLigandSite: predicting ligand-binding sites using similar structures. Nucleic Acids Res 2010, 38:W469-W473.
  • [42]David A, Razali R, Wass MN, Sternberg MJ: Protein-protein interaction sites are hot spots for disease-associated nonsynonymous SNPs. Hum mutat 2012, 33:359-363.
  • [43]Nimrod G, Glaser F, Steinberg D, Ben-Tal N, Pupko T: In silico identification of functional regions in proteins. Bioinformatics 2005, 21(Suppl 1):i328-i337.
  • [44]Nimrod G, Schushan M, Steinberg DM, Ben-Tal N: Detection of functionally important regions in "hypothetical proteins" of known structure. Structure 2008, 16:1755-1763.
  • [45]de Brevern AG, Bornot A, Craveur P, Etchebest C, Gelly JC: PredyFlexy: flexibility and local structure prediction from sequence. Nucleic Acids Res 2012, 40:W317-W322.
  • [46]Kuznetsov IB: Ordered conformational change in the protein backbone: prediction of conformationally variable positions from sequence and low-resolution structural data. Proteins 2008, 72:74-87.
  • [47]Kuznetsov IB, McDuffie M: FlexPred: a web-server for predicting residue positions involved in conformational switches in proteins. Bioinformation 2008, 3:134-136.
  • [48]Reimand J, Arak T, Vilo J: g:Profiler–a web server for functional interpretation of gene lists (2011 update). Nucleic Acids Res 2011, 39:W307-W315.
  • [49]Stark C, Breitkreutz BJ, Chatr-Aryamontri A, Boucher L, Oughtred R, Livstone MS, Nixon J, Van Auken K, Wang X, Shi X, et al.: The BioGRID interaction database: 2011 update. Nucleic Acids Res 2011, 39:D698-D704.
  • [50]Hakenberg J, Voronov D, Nguyen VH, Liang S, Anwar S, Lumpkin B, Leaman R, Tari L, Baral C: A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions. J Biomed inform 2012, 45:842-850.
  • [51]Ireland J, Carlton VE, Falkowski M, Moorhead M, Tran K, Useche F, Hardenbol P, Erbilgin A, Fitzgerald R, Willis TD, Faham M: Large-scale characterization of public database SNPs causing non-synonymous changes in three ethnic groups. Hum Genet 2006, 119:75-83.
  • [52]Kumar S, Dudley JT, Filipski A, Liu L: Phylomedicine: an evolutionary telescope to explore and diagnose the universe of disease mutations. Trends Genet 2011, 27:377-386.
  • [53]MacArthur DG, Balasubramanian S, Frankish A, Huang N, Morris J, Walter K, Jostins L, Habegger L, Pickrell JK, Montgomery SB, et al.: A systematic survey of loss-of-function variants in human protein-coding genes. Science 2012, 335:823-828.
  • [54]Pelak K, Shianna KV, Ge D, Maia JM, Zhu M, Smith JP, Cirulli ET, Fellay J, Dickson SP, Gumbs CE, et al.: The characterization of twenty sequenced human genomes. PLoS Genet 2010, 6:e1001111.
  • [55]Teo SM, Ku CS, Naidoo N, Hall P, Chia KS, Salim A, Pawitan Y: A population-based study of copy number variants and regions of homozygosity in healthy Swedish individuals. J Hum Genet 2011, 56:524-533.
  • [56]Cmarik JL: From bioinformatics to bioassays: gleaning insights into protein structure-function from disease-associated nsSNPs. Mol Interv 2008, 8:162-164.
  • [57]Chasman DI: Functional assessment of amino acid variation caused by single nucleotide polymorphisms: a structural view. In Protein structure: determination, analysis, and applications for drug discovery. Edited by Chasman DI. New York: Marcel Dekker; 2003:606. xiv
  • [58]Jordan DM, Kiezun A, Baxter SM, Agarwala V, Green RC, Murray MF, Pugh T, Lebo MS, Rehm HL, Funke BH, Sunyaev SR: Development and validation of a computational method for assessment of missense variants in hypertrophic cardiomyopathy. Am J Hum Genet 2011, 88:183-192.
  • [59]Capriotti E, Altman RB, Suppl 4: Improving the prediction of disease-related variants using protein three-dimensional structure. BMC Bioinformatics 2011, 12:S3.
  • [60]Corder EH, Saunders AM, Strittmatter WJ, Schmechel DE, Gaskell PC, Small GW, Roses AD, Haines JL, Pericak-Vance MA: Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science 1993, 261:921-923.
  • [61]Reiman EM, Caselli RJ, Yun LS, Chen K, Bandy D, Minoshima S, Thibodeau SN, Osborne D: Preclinical evidence of Alzheimer's disease in persons homozygous for the epsilon 4 allele for apolipoprotein E. N Engl J Med 1996, 334:752-758.
  • [62]Federoff HJ: Alzheimer's disease: reducing the burden with ApoE2. Gene Ther 2005, 12:1019-1029.
  • [63]Breslow JL, Zannis VI, SanGiacomo TR, Third JL, Tracy T, Glueck CJ: Studies of familial type III hyperlipoproteinemia using as a genetic marker the apoE phenotype E2/2. J Lipid Res 1982, 23:1224-1235.
  • [64]Weisgraber KH: Apolipoprotein E: structure-function relationships. Adv Protein Chem 1994, 45:249-302.
  • [65]Bolino A, Muglia M, Conforti FL, LeGuern E, Salih MA, Georgiou DM, Christodoulou K, Hausmanowa-Petrusewicz I, Mandich P, Schenone A, et al.: Charcot-Marie-Tooth type 4B is caused by mutations in the gene encoding myotubularin-related protein-2. Nat Genet 2000, 25:17-19.
  • [66]Bolino A, Lonie LJ, Zimmer M, Boerkoel CF, Takashima H, Monaco AP, Lupski JR: Denaturing high-performance liquid chromatography of the myotubularin-related 2 gene (MTMR2) in unrelated patients with Charcot-Marie-Tooth disease suggests a low frequency of mutation in inherited neuropathy. Neurogenetics 2001, 3:107-109.
  • [67]Charcot-Marie-Tooth disease fact sheet http://www.ninds.nih.gov/disorders/charcot_marie_tooth/detail_charcot_marie_tooth.htm webcite
  • [68]Zwicker JI, Peyvandi F, Palla R, Lombardi R, Canciani MT, Cairo A, Ardissino D, Bernardinelli L, Bauer KA, Lawler J, Mannucci P: The thrombospondin-1 N700S polymorphism is associated with early myocardial infarction without altering von Willebrand factor multimer size. Blood 2006, 108:1280-1283.
  • [69]Carlson CB, Liu Y, Keck JL, Mosher DF: Influences of the N700S thrombospondin-1 polymorphism on protein structure and stability. J Biol Chem 2008, 283:20069-20076.
  • [70]Carniel E, Taylor MR, Sinagra G, Di Lenarda A, Ku L, Fain PR, Boucek MM, Cavanaugh J, Miocic S, Slavov D, et al.: Alpha-myosin heavy chain: a sarcomeric gene associated with dilated and hypertrophic phenotypes of cardiomyopathy. Circulation 2005, 112:54-59.
  • [71]Granados-Riveron JT, Ghosh TK, Pope M, Bu'Lock F, Thornborough C, Eason J, Kirk EP, Fatkin D, Feneley MP, Harvey RP, et al.: Alpha-cardiac myosin heavy chain (MYH6) mutations affecting myofibril formation are associated with congenital heart defects. Hum Mol Genet 2010, 19:4007-4016.
  • [72]Numao S, Maurus R, Sidhu G, Wang Y, Overall CM, Brayer GD, Withers SG: Probing the role of the chloride ion in the mechanism of human pancreatic alpha-amylase. Biochemistry 2002, 41:215-225.
  • [73]Rydberg EH, Li C, Maurus R, Overall CM, Brayer GD, Withers SG: Mechanistic analyses of catalysis in human pancreatic alpha-amylase: detailed kinetic and structural studies of mutants of three conserved carboxylic acids. Biochemistry 2002, 41:4492-4502.
  • [74]Petsko GA, Ringe D: From sequence to structure. In Protein Structure and Function. Edited by Lawrence E, Robertson M. London: New Science Press; 2004:1-49.
  • [75]Betts MJ, Russell RB: Amino acid properties and consequences of substitutions. In Bioinformatics for geneticists. Edited by Barnes MR, Gray IC. New Jersey: Wiley; 2003:289-316.
  • [76]Brayer GD, Luo Y, Withers SG: The structure of human pancreatic alpha-amylase at 1.8 A resolution and comparisons with related enzymes. Protein Sci 1995, 4:1730-1742.
  • [77]Kelley LA, Sternberg MJ: Protein structure prediction on the web: a case study using the Phyre server. Nat Protoc 2009, 4:363-371.
  文献评价指标  
  下载次数:23次 浏览次数:5次