BMC Bioinformatics | |
Multi-TGDR, a multi-class regularization method, identifies the metabolic profiles of hepatocellular carcinoma and cirrhosis infected with hepatitis B or hepatitis C virus | |
Suyan Tian3  Howard H Chang1  Chi Wang4  Jing Jiang3  Xiaomei Wang2  Junqi Niu2  | |
[1] Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Road NE, Atlanta, GA 30322, USA | |
[2] Department of Hepatology, First Hospital of the Jilin University, 71Xinmin Street, Changchun, Jilin 130021, China | |
[3] Division of Clinical Epidemiology, First Hospital of the Jilin University, 71Xinmin Street, Changchun, Jilin 130021, China | |
[4] Department of Biostatistics and Markey Cancer Center, University of Kentucky, 800 Rose St., Lexington, KY 40536, USA | |
关键词: Omics data; Metabolomics; Feature selection; Hepatocellular carcinoma (HCC); Metabolic profile; Multi-class classification; Threshold gradient descent regularization (TGDR); | |
Others : 818691 DOI : 10.1186/1471-2105-15-97 |
|
received in 2013-09-05, accepted in 2014-03-25, 发布年份 2014 | |
【 摘 要 】
Background
Over the last decade, metabolomics has evolved into a mainstream enterprise utilized by many laboratories globally. Like other “omics” data, metabolomics data has the characteristics of a smaller sample size compared to the number of features evaluated. Thus the selection of an optimal subset of features with a supervised classifier is imperative. We extended an existing feature selection algorithm, threshold gradient descent regularization (TGDR), to handle multi-class classification of “omics” data, and proposed two such extensions referred to as multi-TGDR. Both multi-TGDR frameworks were used to analyze a metabolomics dataset that compares the metabolic profiles of hepatocellular carcinoma (HCC) infected with hepatitis B (HBV) or C virus (HCV) with that of cirrhosis induced by HBV/HCV infection; the goal was to improve early-stage diagnosis of HCC.
Results
We applied two multi-TGDR frameworks to the HCC metabolomics data that determined TGDR thresholds either globally across classes, or locally for each class. Multi-TGDR global model selected 45 metabolites with a 0% misclassification rate (the error rate on the training data) and had a 3.82% 5-fold cross-validation (CV-5) predictive error rate. Multi-TGDR local selected 48 metabolites with a 0% misclassification rate and a 5.34% CV-5 error rate.
Conclusions
One important advantage of multi-TGDR local is that it allows inference for determining which feature is related specifically to the class/classes. Thus, we recommend multi-TGDR local be used because it has similar predictive performance and requires the same computing time as multi-TGDR global, but may provide class-specific inference.
【 授权许可】
2014 Tian et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20140711133339949.pdf | 1040KB | download | |
Figure 4. | 54KB | Image | download |
Figure 3. | 21KB | Image | download |
Figure 2. | 46KB | Image | download |
Figure 1. | 44KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
【 参考文献 】
- [1]Saeys Y, Inza I, Larrañaga P: A review of feature selection techniques in bioinformatics. Bioinformatics 2007, 23:2507-2517.
- [2]Friedman JH: Gradient Directed Regularization for Linear Regression and Classification. 2004. Techinical report
- [3]Tian S, Krueger JG, Li K, Jabbari A, Brodmerkel C, Lowes MA, Suárez-Fariñas M: Meta-analysis derived (MAD) transcriptome of psoriasis defines the “core” pathogenesis of disease. PLoS One 2012, 7:e44274.
- [4]Tian S, Suárez-Fariñas M: Multi-TGDR: a regularization method for multi-class classification in microarray experiments. PLoS One 2013, 8:e78302.
- [5]Tian S, Suárez-fariñas M: Hierarchical-TGDR: combining biological hierarchy with a regularization method for multi-class classification of lung cancer samples via high-throughput gene-expression data. Syst Biomed 2013, 1:93-102.
- [6]Li T, Zhang C, Ogihara M: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 2004, 20:2429-2437.
- [7]Yeung KY, Bumgarner RE, Raftery AE: Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 2005, 21:2394-2402.
- [8]Zhang M-L, Zhou Z-H: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit 2007, 40:2038-2048.
- [9]Vens C, Struyf J, Schietgat L, Džeroski S, Blockeel H: Decision trees for hierarchical multi-label classification. Mach Learn 2008, 73:185-214.
- [10]Student S, Fujarewicz K: Stable feature selection and classification algorithms for multiclass microarray data. Biol Direct 2012, 7:33. BioMed Central Full Text
- [11]Ma S, Huang J: Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics 2005, 21:4356-4362.
- [12]Daviss B: Growing pains for metabolomics. Science 2005, 19:25-28.
- [13]Wishart DS, Knox C, Guo AC, Eisner R, Young N, Gautam B, Hau DD, Psychogios N, Dong E, Bouatra S, Mandal R, Sinelnikov I, Xia J, Jia L, Cruz JA, Lim E, Sobsey CA, Shrivastava S, Huang P, Liu P, Fang L, Peng J, Fradette R, Cheng D, Tzur D, Clements M, Lewis A, De Souza A, Zuniga A, Dawe M, et al.: HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res 2009, 37:D603-D610.
- [14]Noble WS, MacCoss MJ: Computational and statistical analysis of protein mass spectrometry data. PLoS Comput Biol 2012, 8:e1002296.
- [15]Baumgartner C, Osl M, Netzer M, Baumgartner D: Bioinformatic-driven search for metabolic biomarkers in disease. J Clin Bioinform 2011, 1:2. BioMed Central Full Text
- [16]Ramadan Z, Jacobs D, Grigorov M, Kochhar S: Metabolic profiling using principal component analysis, discriminant partial least squares, and genetic algorithms. Talanta 2006, 68:1683-1691.
- [17]Chen M, Ni Y, Duan H, Qiu Y, Guo C, Jiao Y, Shi H, Su M, Jia W: Mass spectrometry-based metabolic profiling of rat urine associated with general toxicity induced by the multiglycoside of Tripterygium wilfordii Hook. f. Chem Res Toxicol 2008, 21:288-294.
- [18]Chen J, Zhang X, Cao R, Lu X, Zhao S, Fekete A, Huang Q, Schmitt-Kopplin P, Wang Y, Xu Z, Wan X, Wu X, Zhao N, Xu C, Xu G: Serum 27-nor-5β-cholestane-3,7,12,24,25 pentol glucuronide discovered by metabolomics as potential diagnostic biomarker for epithelium ovarian cancer. J Proteome Res 2011, 10:2625-2632.
- [19]Zhou L, Ding L, Yin P, Lu X, Wang X, Niu J, Gao P, Xu G: Serum metabolic profiling study of hepatocellular carcinoma infected with hepatitis B or hepatitis C virus by using liquid chromatography-mass spectrometry. J Proteome Res 2012, 11:5433-5442.
- [20]Kumar V, Fausto N, Abbas A: Robbins & Cotran Pathologic Basis of Disease. 7th edition. Philadelphia: Elsevier Saunders; 2005.
- [21]Chen L, Ho DWY, Lee NPY, Sun S, Lam B, Wong K-F, Yi X, Lau GK, Ng EWY, Poon TCW, Lai PBS, Cai Z, Peng J, Leng X, Poon RTP, Luk JM: Enhanced detection of early hepatocellular carcinoma by serum SELDI-TOF proteomic signature combined with alpha-fetoprotein marker. Ann Surg Oncol 2010, 17:2518-2525.
- [22]Colli A, Casazza G, Massironi S, Colucci A, Conte D, Duca P: Accuracy of ultrasonography, spiral CT, magnetic resonance, and alpha-fetoprotein in diagnosing hepatocellular carcinoma: a systematic review. Am J Gastroenterol 2006, 101:513-523.
- [23]Nicholson JK, Lindon JC, Holmes E: “Metabonomics”: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica 1999, 29:1181-1189.
- [24]Van der Greef J, Stroobant P, van der Heijden R: The role of analytical sciences in medical systems biology. Curr Opin Chem Biol 2004, 8:559-565.
- [25]Geisser S: Predictive Inference: An Introduction. New York: Chapman & Hall; 1993.
- [26]Hall MA: Correlation-based Feature Selection for Machine Learning. Waikato University, Computer Science Department; 1999.
- [27]Breiman L: Bagging predictors. Mach Learn 1996, 24:123-140.
- [28]Smyth G: Limma: linear models for microarray data. In Bioinformatics and Computational Biology Solutions using R and Bioconductor. Edited by Gentleman R, Carey V, Dudoit S, Irizarry R, Huber W. New York: Springer; 2005:397-420.
- [29]Tan ZB, Tonks CE, O’Donnell GE, Geyer R: An improved HPLC analysis of the metabolite furoic acid in the urine of workers occupationally exposed to furfural. J Anal Toxicol 2003, 27:43-46.
- [30]Shimizu A, Kanisawa M: Experimental studies on hepatic cirrhosis and hepatocarcinogenesis. I. Production of hepatic cirrhosis by furfural administration. Acta Pathol Jpn 1986, 36:1027-1038.
- [31]Lord JL, de Peyster A, Quintana PJE, Metzger RP: Cytotoxicity of xanthopterin and isoxanthopterin in MCF-7 cells. Cancer Lett 2005, 222:119-124.