期刊论文详细信息
Source Code for Biology and Medicine
The non-negative matrix factorization toolbox for biological data mining
Alioune Ngom1  Yifeng Li1 
[1] School of Computer Science, University of Windsor, Windsor, Ontario, Canada
关键词: Missing values;    Classification;    Feature selection;    Feature extraction;    Bi-clustering;    Clustering;    Non-negative matrix factorization;   
Others  :  805727
DOI  :  10.1186/1751-0473-8-10
 received in 2012-11-30, accepted in 2013-04-10,  发布年份 2013
PDF
【 摘 要 】

Background

Non-negative matrix factorization (NMF) has been introduced as an important method for mining biological data. Though there currently exists packages implemented in R and other programming languages, they either provide only a few optimization algorithms or focus on a specific application field. There does not exist a complete NMF package for the bioinformatics community, and in order to perform various data mining tasks on biological data.

Results

We provide a convenient MATLAB toolbox containing both the implementations of various NMF techniques and a variety of NMF-based data mining approaches for analyzing biological data. Data mining approaches implemented within the toolbox include data clustering and bi-clustering, feature extraction and selection, sample classification, missing values imputation, data visualization, and statistical comparison.

Conclusions

A series of analysis such as molecular pattern discovery, biological process identification, dimension reduction, disease prediction, visualization, and statistical comparison can be performed using this toolbox.

【 授权许可】

   
2013 Li and Ngom; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140708082524839.pdf 899KB PDF download
Figure 9. 18KB Image download
Figure 8. 48KB Image download
Figure 7. 45KB Image download
Figure 6. 48KB Image download
Figure 5. 44KB Image download
Figure 4. 39KB Image download
Figure 3. 54KB Image download
Figure 2. 52KB Image download
Figure 1. 60KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

【 参考文献 】
  • [1]Lee DD, Seung S: Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401:788-791.
  • [2]Brunet J, Tamayo P, Golub T, Mesirov J: Metagenes and molecular pattern discovery using matrix factorization. PNAS 2004, 101(12):4164-4169.
  • [3]Kim H, Park H: Sparse non-negatice matrix factorization via alternating non-negativity-constrained least aquares for microarray data analysis. SIAM J Matrix Anal Appl 2007, 23(12):1495-1502.
  • [4]Carmona-Saez P, Pascual-Marqui RD, Tirado F, Carazo JM, Pascual-Montano A: Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinformatics 2006, 7:78. BioMed Central Full Text
  • [5]Wang G, Kossenkov A, Ochs M: LS-NMF: A modified non-negative matrix factorization algorithm utilizing uncertainty estimates. BMC Bioinformatics 2006, 7:175. BioMed Central Full Text
  • [6]Li Y, Ngom A: A new kernel non-negative matrix factorization and its application in microarray data analysis. In CIBCB. IEEE CIS Society Piscataway: IEEE Press; 2012:371-378.
  • [7]Cichocki A, Zdunek R: NMFLAB - MATLAB toolbox for non-negative matrix factorization. Tech. rep.; 2006. [http://www.bsp.brain.riken.jp/ICALAB/nmflab.html webcite]
  • [8]The NMF: DTU toolbox Tech. rep., Technical University of Denmark [http://cogsys.imm.dtu.dk/toolbox/nmf webcite]
  • [9]Liu S: NMFN: non-negative matrix factorization. Tech. rep., Duke University,; 2011. [http://cran.r-project.org/web/packages/NMFN webcite]
  • [10]Gaujoux R, Seoighe C: A flexible R package for nonnegative matrix factorization. BMC Bioinformatics 2010, 11:367. [http://cran.r-project.org/web/packages/NMF webcite] BioMed Central Full Text
  • [11]Qi Q, Zhao Y, Li M, Simon R: non-negative matrix factorization of gene expression profiles: A plug-in for BRB-ArrayTools. Bioinformatics 2009, 25(4):545-547.
  • [12]Fertig E, Ding J, Favorov A, Parmigiani G, Ochs M: CoGAPS: an R/C++ package to identify patterns and biological process activity in transcriptomic data. Bioinformatics 2010, 26(21):2792-2793.
  • [13]Ochs M, Fertig E: Matrix factorization for transcriptional regulatory network inference. CIBCB, IEEE CIS Society. Piscataway: IEEE Press; 2012, 387-396.
  • [14]Lee D, Seung S: Algorithms for non-negative matrix factorization. In NIPS. Cambridge: MIT Press; 2001:556-562.
  • [15]Kim H, Park H: Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM J Matrix Anal Appl 2008, 30(2):713-730.
  • [16]Ding C, Li T, Jordan MI: Convex and semi-nonnegative matrix factorizations. TPAMI 2010, 32:45-55.
  • [17]Tibshirani R: Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 1996, 58:267-288.
  • [18]Zou H, Hastie T: Regularization and variable selection via the elastic Net. J R Stat Soc- Ser B: Stat Methodol 2005, 67(2):301-320.
  • [19]Zhang D, Zhou Z, Chen S: Non-negative matrix factorization on kernels. LNCS 2006, 4099:404-412.
  • [20]Ding C, Li T, Peng W, Park H: Orthogonal nonnegative matrix tri-factorizations for clustering. In KDD. New York: ACM; 2006:126-135.
  • [21]Zass R, Shashua A: Non-negative sparse PCA. In NIPS. Cambridge: MIT Press; 2006.
  • [22]Ho N: Nonnegative matrix factorization algorithms and applications. PhD thesis,Louvain-la-Neuve: Belgium; 2008
  • [23]Madeira S, Oliveira A: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans Comput Biol Bioinformatics 2004, 1:24-45.
  • [24]Kim P, Tidor B: Subsystem identification through dimensionality reduction of large-scale gene expression data. Genome Res 2003, 13:1706-1718.
  • [25]Draghici S, Khatri P, Bhavsar P, Shah A, Krawetz S, Tainsky M: Onto-tools, the toolkit of the modern biologist: onto-express, onto-compare, onto-design and onto-translate. Nucleic Acids Res 2003, 31(13):3775-3781.
  • [26]Mewes H, Frishman D, Gruber C, Geier B, Haase D, Kaps A, Lemcke K, Pfeiffer F, Schuller C, Stocker S, Mannhaupt G: MIPS: A database for genomes and protein sequences. Nucleic Acids Res 2000, 28:37-40.
  • [27]Boyle E, Weng S, Gollub J, Jin H, Botstein D, Cherry J, Sherlock G: GO::TermFinder – open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes. Bioinformatics 2004, 20:3710-3715.
  • [28]Sherman B, Huang D: Systematic and integrative analysis of large gene lists using DAVID Bioinformatics resources. Nature Protoc 2009, 4:44-57.
  • [29]Huang D, Sherman B, Lempicki R: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 2009, 37:1-13.
  • [30]Tu B, Kudlicki A, Rowicka M, McKnight S: Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes. Science 2005, 310:1152-1158.
  • [31]Chandriani S, Frengen E, Cowling V, Pendergrass S, Perou C, Whitfield M, Cole M: A core MYC gene expression signature is prominient in basal-like breast cancer but only partially overlaps the core serum response. PloS ONE 2009, 4(5):e6693.
  • [32]Ochs M, Rink L, Tarn C, Mburu S, Taguchi T, Eisenberg B, Godwin A: Detection of treatment-induced changes in signaling pathways in sastrointestinal stromal tumors using transcripttomic data. Cancer Res 2009, 69(23):9125-9132.
  • [33]Khan J: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 2001, 7(6):673-679.
  • [34]Hu Z: The molecular portraits of breast tumors are conserved across microarray platforms. BMC Genomics 2006, 7:96. BioMed Central Full Text
  • [35]Li Y, Ngom A: Classification approach based on non-negative least squares. Neurocomputing 2013,. in press
  • [36]Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman R: Missing value estimation methods for DNA microarrays. Bioinformatics 2001, 17(6):520-525.
  • [37]Mukherjee S, Tamayo P, Rogers S, Rifkin R, Engle A, Campbell C, Golub T, Mesirov J: Estimating dataset size requirements for classifying DNA microarray data. J Comput Biol 2003, 10(2):119-142.
  • [38]Demsar J: Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 2006, 7:1-30.
  文献评价指标  
  下载次数:80次 浏览次数:36次