期刊论文详细信息
BMC Bioinformatics
The top-scoring ‘N’ algorithm: a generalized relative expression classification method from small numbers of biomolecules
Andrew T Magis1  Nathan D Price1 
[1] Center for Biophysics and Computational Biology, University of Illinois, Urbana, IL, 61801, USA
关键词: Microarray;    Graphics processing unit;    Support vector machine;    Cross validation;    Relative expression;    Top-scoring pair;    Classification;   
Others  :  1088143
DOI  :  10.1186/1471-2105-13-227
 received in 2012-02-19, accepted in 2012-09-03,  发布年份 2012
PDF
【 摘 要 】

Background

Relative expression algorithms such as the top-scoring pair (TSP) and the top-scoring triplet (TST) have several strengths that distinguish them from other classification methods, including resistance to overfitting, invariance to most data normalization methods, and biological interpretability. The top-scoring ‘N’ (TSN) algorithm is a generalized form of other relative expression algorithms which uses generic permutations and a dynamic classifier size to control both the permutation and combination space available for classification.

Results

TSN was tested on nine cancer datasets, showing statistically significant differences in classification accuracy between different classifier sizes (choices of N). TSN also performed competitively against a wide variety of different classification methods, including artificial neural networks, classification trees, discriminant analysis, k-Nearest neighbor, naïve Bayes, and support vector machines, when tested on the Microarray Quality Control II datasets. Furthermore, TSN exhibits low levels of overfitting on training data compared to other methods, giving confidence that results obtained during cross validation will be more generally applicable to external validation sets.

Conclusions

TSN preserves the strengths of other relative expression algorithms while allowing a much larger permutation and combination space to be explored, potentially improving classification accuracies when fewer numbers of measured features are available.

【 授权许可】

   
2012 Magis and Price; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150117080730210.pdf 610KB PDF download
Figure 6. 63KB Image download
Figure 5. 66KB Image download
Figure 4. 31KB Image download
Figure 3. 42KB Image download
Figure 2. 45KB Image download
Figure 1. 89KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

【 参考文献 】
  • [1]Geman D, D'Avignon C, Naiman DQ, Winslow RL: Classifying gene expression profiles from pairwise mRNA comparisons. Stat Appl Genet Mol Biol 2004, 3:Article 19.
  • [2]Lin X, Afsari B, Marchionni L, Cope L, Parmigiani G, Naiman D, Geman D: The ordering of expression among a few genes can provide simple cancer biomarkers and signal BRCA1 mutations. BMC Bioinformatics 2009, 10:256. BioMed Central Full Text
  • [3]Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003, 19(2):185-193.
  • [4]Price ND, Trent J, El-Naggar AK, Cogdell D, Taylor E, Hunt KK, Pollock RE, Hood L, Shmulevich I, Zhang W: Highly accurate two-gene classifier for differentiating gastrointestinal stromal tumors and leiomyosarcomas. Proc Natl Acad Sci USA 2007, 104(9):3414-3419.
  • [5]Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D: Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics 2005, 21(20):3896-3904.
  • [6]Eddy JA, Sung J, Geman D, Price ND: Relative expression analysis for molecular cancer diagnosis and prognosis. Technol Cancer Res Treat 2010, 9(2):149-159.
  • [7]Brown MP, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M Jr, Haussler D: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA 2000, 97(1):262-267.
  • [8]Zhang H, Yu CY, Singer B: Cell and tumor classification using gene expression data: construction of forests. Proc Natl Acad Sci USA 2003, 100(7):4168-4172.
  • [9]Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, et al.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 2001, 7(6):673-679.
  • [10]Eddy JA, Hood L, Price ND, Geman D: Identifying tightly regulated and variably expressed networks by Differential Rank Conservation (DIRAC). PLoS Comput Biol 2010, 6(5):e1000792.
  • [11]Shi L, Campbell G, Jones WD, Campagne F, Wen Z, Walker SJ, Su Z, Chu TM, Goodsaid FM, Pusztai L, et al.: The microarray quality control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol 2010, 28(8):827-838.
  • [12]Magis AT, Earls JC, Ko YH, Eddy JA, Price ND: Graphics processing unit implementations of relative expression analysis algorithms enable dramatic computational speedup. Bioinformatics 2011, 27(6):872-873.
  • [13]Stone JE, Hardy DJ, Ufimtsev IS, Schulten K: GPU-accelerated molecular modeling coming of age. J Mol Graph Model 2010, 29(2):116-125.
  • [14]Michalakes J, Vachharajani M: GPU acceleration of numerical weather prediction. Parallel Process Lett 2008, 18(4):531-548.
  • [15]Ufimtsev IS, Martinez TJ: Graphical processing units for quantum chemistry. Comput Sci Eng 2008, 10(6):26-34.
  • [16]Schatz MC, Trapnell C, Delcher AL, Varshney A: High-throughput sequence alignment using graphics processing units. BMC Bioinformatics 2007, 8:474. BioMed Central Full Text
  • [17]Stone SS, Haldar JP, Tsao SC, Hwu W-mW, Sutton BP, Liang Z-P: Accelerating advanced MRI reconstructions on GPUs. J Parallel Distrib Comput 2008, 68(10):1307-1318.
  • [18]Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 1999, 96(12):6745-6750.
  • [19]Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286(5439):531-537.
  • [20]Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JY, Goumnerova LC, Black PM, Lau C, et al.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 2002, 415(6870):436-442.
  • [21]Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS, et al.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 2002, 8(1):68-74.
  • [22]Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D'Amico AV, Richie JP, et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 2002, 1(2):203-209.
  • [23]Stuart RO, Wachsman W, Berry CC, Wang-Rodriguez J, Wasserman L, Klacansky I, Masys D, Arden K, Goodison S, McClelland M, et al.: In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. Proc Natl Acad Sci USA 2004, 101(2):615-620.
  • [24]Welsh JB, Sapinoso LM, Su AI, Kern SG, Wang-Rodriguez J, Moskaluk CA, Frierson HF Jr, Hampton GM: Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Res 2001, 61(16):5974-5978.
  • [25]Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, et al.: Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci USA 2001, 98(26):15149-15154.
  • [26]Liu WM, Mei R, Di X, Ryder TB, Hubbell E, Dee S, Webster TA, Harrington CA, Ho MH, Baid J, et al.: Analysis of high density expression microarrays with signed-rank call algorithms. Bioinformatics 2002, 18(12):1593-1599.
  文献评价指标  
  下载次数:81次 浏览次数:22次