期刊论文详细信息
BMC Bioinformatics
Phenotype prediction based on genome-wide DNA methylation data
Thomas Wilhelm1 
[1]Theoretical Systems Biology, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA, UK
关键词: Classifier;    Machine learning;    Feature selection;    Cancer;    DNA methylation;    Epigenetics;   
Others  :  818419
DOI  :  10.1186/1471-2105-15-193
 received in 2014-02-06, accepted in 2014-06-10,  发布年份 2014
PDF
【 摘 要 】

Background

DNA methylation (DNAm) has important regulatory roles in many biological processes and diseases. It is the only epigenetic mark with a clear mechanism of mitotic inheritance and the only one easily available on a genome scale. Aberrant cytosine-phosphate-guanine (CpG) methylation has been discussed in the context of disease aetiology, especially cancer. CpG hypermethylation of promoter regions is often associated with silencing of tumour suppressor genes and hypomethylation with activation of oncogenes.

Supervised principal component analysis (SPCA) is a popular machine learning method. However, in a recent application to phenotype prediction from DNAm data SPCA was inferior to the specific method EVORA.

Results

We present Model-Selection-SPCA (MS-SPCA), an enhanced version of SPCA. MS-SPCA applies several models that perform well in the training data to the test data and selects the very best models for final prediction based on parameters of the test data.

We have applied MS-SPCA for phenotype prediction from genome-wide DNAm data. CpGs used for prediction are selected based on the quantification of three features of their methylation (average methylation difference, methylation variation difference and methylation-age-correlation). We analysed four independent case–control datasets that correspond to different stages of cervical cancer: (i) cases currently cytologically normal, but will later develop neoplastic transformations, (ii, iii) cases showing neoplastic transformations and (iv) cases with confirmed cancer. The first dataset was split into several smaller case–control datasets (samples either Human Papilloma Virus (HPV) positive or negative). We demonstrate that cytology normal HPV+ and HPV- samples contain DNAm patterns which are associated with later neoplastic transformations. We present evidence that DNAm patterns exist in cytology normal HPV- samples that (i) predispose to neoplastic transformations after HPV infection and (ii) predispose to HPV infection itself. MS-SPCA performs significantly better than EVORA.

Conclusions

MS-SPCA can be applied to many classification problems. Additional improvements could include usage of more than one principal component (PC), with automatic selection of the optimal number of PCs. We expect that MS-SPCA will be useful for analysing recent larger DNAm data to predict future neoplastic transformations.

【 授权许可】

   
2014 Wilhelm; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140711101616135.pdf 1766KB PDF download
Figure 3. 95KB Image download
Figure 2. 82KB Image download
Figure 1. 91KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

【 参考文献 】
  • [1]Bird A: DNA methylation patterns and epigenetic memory. Genes Dev 2002, 16:6-21.
  • [2]Bock C: Analysing and interpreting DNA methylation data. Nat Rev Genet 2012, 13:705-719.
  • [3]Rakyan VK, Down TA, Balding DJ, Beck S: Epigenome-wide association studies for common human diseases. Nat Rev Genet 2011, 12:529-541.
  • [4]McKay JA, Mathers JC: Diet induced epigenetic changes and their implications for health. Acta Physiol (Oxf) 2011, 202:103-118.
  • [5]Slomko H, Heo HJ, Einstein FH: Minireview: Epigenetics of obesity and diabetes in humans. Endocrinology 2012, 153:1025-1030.
  • [6]Stewart BW, Wild CP: World Cancer Report 2014. WHO Press; 2014.
  • [7]Baylin SB, Jones PA: A decade of exploring the cancer epigenome – biological and translational implications. Nat Rev Cancer 2011, 11:726-734.
  • [8]De Carvalho DD, Sharma S, You JS, Su SF, Taberlay PC, Kelly TK, Yang X, Liang G, Jones PA: DNA methylation screening identifies driver epigenetic events of cancer cell survival. Cancer Cell 2012, 21:655-667.
  • [9]Feinberg AP, Irizarry RA, Fradin D, Aryee MJ, Murakami P, Aspelund T, Eiriksdottir G, Harris TB, Launer L, Gudnason V, Fallin MD: Personalized epigenomic signatures that are stable over time and covary with body mass index. Sci Transl Med 2010, 2:49ra67.
  • [10]Teschendorff AE, Widschwendter M: Differential variability improves the identification of cancer risk markers in DNA methylation studies profiling precursor cancer lesions. Bioinformatics 2012, 28:1487-1494.
  • [11]Tibshirani R, Hastie T, Narasimhan B, Chu G: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 2002, 99:6567-6572.
  • [12]Bair E, Tibshirani R: Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol 2004, 2:E108.
  • [13]Feinberg AP, Vogelstein B: Hypomethylation distinguishes genes of some human cancers from their normal counterparts. Nature 1983, 301:89-92.
  • [14]Shao C, Sun W, Tan M, Glazer CA, Bhan S, Zhong X, Fakhry C, Sharma R, Westra WH, Hoque MO, Moskaluk CA, Sidransky D, Califano JA, Ha PK: Integrated, genome-wide screening for hypomethylated oncogenes in salivary gland adenoid cystic carcinoma. Clin Cancer Res 2011, 17:4320-4330.
  • [15]Teschendorff AE, Jones A, Fiegl H, Sargent A, Zhuang JJ, Kitchener HC, Widschwendter M: Epigenetic variability in cells of normal cytology is associated with the risk of future morphological transformation. Genome Med 2012, 4:24.
  • [16]Saslow D, Solomon D, Lawson HW, Killackey M, Kulasingam SL, Cain J, Garcia FAR, Moriarty AT, Waxman AG, Wilbur DC, Wentzensen N, Downs LS, Spitzer M, Moscicki A-B, Franco EL, Stoler MH, Schiffman M, Castle PE, Myers ER: American Cancer Society, American Society for Colposcopy and Cervical Pathology, and American Society for Clinical Pathology Screening Guidelines for the prevention and early detection of cervical cancer. J Low Genit Tract Dis 2012, 16:3.
  • [17]Bibikova M, Fan JB: Genome-wide DNA methylation profiling. Wiley Interdiscip Rev Syst Biol Med 2010, 2:210-223.
  • [18]Lee TI, Jenner RG, Boyer LA, Guenther MG, Levine SS, Kumar RM, Chevalier B, Johnstone SE, Cole MF, Isono K, Koseki H, Fuchikami T, Abe K, Murray HL, Zucker JP, Yuan B, Bell GW, Herbolsheimer E, Hannett NM, Sun K, Odom DT, Otte AP, Volkert TL, Bartel DP, Melton DA, Gifford DK, Jaenisch R, Young RA: Control of Developmental Regulators by Polycomb in Human Embryonic Stem Cells. Cell 2006, 125:301-313.
  • [19]Agarwal SM, Raghav D, Singh H, Raghava GPS: CCDB: a curated database of genes involved in cervix cancer. Nucleic Acids Res 2011, 39:D975-D979.
  • [20]Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, Goddard ME, Visscher PM: Common SNPs explain a large proportion of the heritability for human height. Nat Genet 2010, 42:565-569.
  • [21]Gibson G: Hints of hidden heritability in GWAS. Nat Genet 2010, 42:558-560.
  • [22]Struyk AF, Canoll PD, Wolfgang MJ, Rosen CL, D'Eustachio P, Salzer JL: Cloning of neurotrimin defines a new subfamily of differentially expressed neural cell adhesion molecules. J Neurosci 1995, 15:2141-2156.
  • [23]Schutyser E, Struyf S, Proost P, Opdenakker G, Laureys G, Verhasselt B, Peperstraete L, Van de Putte I, Saccani A, Allavena P, Mantovani A, Van Damme J: Identification of biologically active chemokine isoforms from ascitic fluid and elevated levels of CCL18/pulmonary and activation-regulated chemokine in ovarian carcinoma. J Biol Chem 2002, 277:24584-24593.
  • [24]Chen J, Yao Y, Gong C, Yu F, Su S, Chen J, Liu B, Deng H, Wang F, Lin L, Yao H, Su F, Anderson KS, Liu Q, Ewen ME, Yao X, Song E: CCL18 from tumor-associated macrophages promotes breast cancer metastasis via PITPNM3. Cancer Cell 2011, 19:541-555.
  • [25]Kolker E, Higdon R, Haynes W, Welch D, Broomall W, Lancet D, Stanberry L, Kolker N: MOPED: Model Organism Protein Expression Database. Nucleic Acids Res 2012, 40:D1093-D1099.
  • [26]Wang M, Weiss M, Simonovic M, Haertinger G, Schrimpf SP, Hengartner MO, von Mering C: PaxDb, a database of protein abundance averages across all three domains of life. Mol Cell Proteomics 2012, 11:492-500.
  • [27]Schaab C, Geiger T, Stoehr G, Cox J, Mann M: Analysis of high-accuracy, quantitative proteomics data in the MaxQB database. Mol Cell Proteomics 2012, 11:M111.014068.
  • [28]Guo M, Akiyama Y, House MG, Hooker CM, Heath E, Gabrielson E, Yang SC, Han Y, Baylin SB, Herman JG, Brock MV: Hypermethylation of the GATA genes in lung cancer. Clin Cancer Res 2004, 10:7917-7924.
  • [29]Caslini C, Capo-chichi CD, Roland IH, Nicolas E, Yeung AT, Xu XX: Histone modifications silence the GATA transcription factor genes in ovarian cancer. Oncogene 2006, 25:5446-5461.
  • [30]Mannisto S, Butzow R, Salonen J, Leminen A, Heikinheimo O, Heikinheimo M: Transcription factors GATA-4 and GATA-6, and their potential downstream effectors in ovarian germ cell tumors. Tumour Biol 2005, 26:265-273.
  • [31]Cai KQ, Caslini C, Capo-chichi CD, Slater C, Smith ER, Wu H, Klein-Szanto AJ, Godwin AK, Xu XX: Loss of GATA4 and GATA6 expression specifies ovarian cancer histological subtypes and precedes neoplastic transformation of ovarian surface epithelia. PLoS One 2009, 4:e6454.
  • [32]Brentnall AR, Vasiljevic N, Scibior-Bentkowska D, Cadman L, Austin J, Szarewski A, Cuzick J, Lorincz AT: A DNA methylation classifier of cervical pre-cancer based on human papilloma virus and human genes. Int J Cancer 2014. doi:10.1002/ijc.28790
  • [33]Timp W, Feinberg AP: Cancer as a dysregulated epigenome allowing cellular growth advantage at the expense of the host. Nat Rev Cancer 2013, 13:497-510.
  • [34]Greger V, Passarge E, Hopping W, Messmer E, Horsthemke B: Epigenetic changes may contribute to the formation and spontaneous regression of retinoblastoma. Hum Genet 1989, 83:155-158.
  • [35]Das PM, Singal R: DNA methylation and cancer. J Clin Oncol 2004, 22:4632-4642.
  • [36]Walboomers JM, Jacobs MV, Manos MM, Bosch FX, Kummer JA, Shah KV, Snijders PJ, Peto J, Meijer CJ, Muñoz N: Human papillomavirus is a necessary cause of invasive cervical cancer worldwide. J Pathol 1999, 189:12-19.
  • [37]Edgar R, Domrachev M, Lash AE: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002, 30:207-210.
  • [38]Bibikova M, Le J, Barnes B, Saedinia-Melnyk S, Zhou L, Shen R, Gunderson KL: Genome-wide DNA methylation profiling using Infinium assay. Epigenomics 2009, 1:177-200.
  • [39]Leek JT, Storey JD: A general framework for multiple testing dependence. Proc Natl Acad Sci U S A 2008, 105:18718-18723.
  • [40]Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, Sun X-W, Varambally S, Cao X, Tchinda J, Kuefer R, Lee C, Montie JE, Shah RB, Pienta KJ, Rubin MA, Chinnaiyan AM: Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 2005, 310:644-648.
  • [41]Teschendorff AE, Menon U, Gentry-Maharaj A, Ramus SJ, Weisenberger DJ, Shen H, Campan M, Noushmehr H, Bell CG, Maxwell AP, Savage DA, Mueller-Holzner E, Marth C, Kocjan G, Gayther SA, Jones A, Beck S, Wagner W, Laird PW, Jacobs IJ, Widschwendter M: Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome Res 2010, 20:440-446.
  • [42]Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 2003, 100:9440-9445.
  • [43]Ahn S, Wang T: A powerful statistical method for identifying differentially methylated markers in complex diseases. Pac Symp Biocomput 2013, 18:69-79.
  文献评价指标  
  下载次数:42次 浏览次数:28次