Source Code for Biology and Medicine | |
The non-negative matrix factorization toolbox for biological data mining | |
Alioune Ngom1  Yifeng Li1  | |
[1] School of Computer Science, University of Windsor, Windsor, Ontario, Canada | |
关键词: Missing values; Classification; Feature selection; Feature extraction; Bi-clustering; Clustering; Non-negative matrix factorization; | |
Others : 805727 DOI : 10.1186/1751-0473-8-10 |
|
received in 2012-11-30, accepted in 2013-04-10, 发布年份 2013 | |
【 摘 要 】
Background
Non-negative matrix factorization (NMF) has been introduced as an important method for mining biological data. Though there currently exists packages implemented in R and other programming languages, they either provide only a few optimization algorithms or focus on a specific application field. There does not exist a complete NMF package for the bioinformatics community, and in order to perform various data mining tasks on biological data.
Results
We provide a convenient MATLAB toolbox containing both the implementations of various NMF techniques and a variety of NMF-based data mining approaches for analyzing biological data. Data mining approaches implemented within the toolbox include data clustering and bi-clustering, feature extraction and selection, sample classification, missing values imputation, data visualization, and statistical comparison.
Conclusions
A series of analysis such as molecular pattern discovery, biological process identification, dimension reduction, disease prediction, visualization, and statistical comparison can be performed using this toolbox.
【 授权许可】
2013 Li and Ngom; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20140708082524839.pdf | 899KB | download | |
Figure 9. | 18KB | Image | download |
Figure 8. | 48KB | Image | download |
Figure 7. | 45KB | Image | download |
Figure 6. | 48KB | Image | download |
Figure 5. | 44KB | Image | download |
Figure 4. | 39KB | Image | download |
Figure 3. | 54KB | Image | download |
Figure 2. | 52KB | Image | download |
Figure 1. | 60KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
Figure 8.
Figure 9.
【 参考文献 】
- [1]Lee DD, Seung S: Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401:788-791.
- [2]Brunet J, Tamayo P, Golub T, Mesirov J: Metagenes and molecular pattern discovery using matrix factorization. PNAS 2004, 101(12):4164-4169.
- [3]Kim H, Park H: Sparse non-negatice matrix factorization via alternating non-negativity-constrained least aquares for microarray data analysis. SIAM J Matrix Anal Appl 2007, 23(12):1495-1502.
- [4]Carmona-Saez P, Pascual-Marqui RD, Tirado F, Carazo JM, Pascual-Montano A: Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinformatics 2006, 7:78. BioMed Central Full Text
- [5]Wang G, Kossenkov A, Ochs M: LS-NMF: A modified non-negative matrix factorization algorithm utilizing uncertainty estimates. BMC Bioinformatics 2006, 7:175. BioMed Central Full Text
- [6]Li Y, Ngom A: A new kernel non-negative matrix factorization and its application in microarray data analysis. In CIBCB. IEEE CIS Society Piscataway: IEEE Press; 2012:371-378.
- [7]Cichocki A, Zdunek R: NMFLAB - MATLAB toolbox for non-negative matrix factorization. Tech. rep.; 2006. [http://www.bsp.brain.riken.jp/ICALAB/nmflab.html webcite]
- [8]The NMF: DTU toolbox Tech. rep., Technical University of Denmark [http://cogsys.imm.dtu.dk/toolbox/nmf webcite]
- [9]Liu S: NMFN: non-negative matrix factorization. Tech. rep., Duke University,; 2011. [http://cran.r-project.org/web/packages/NMFN webcite]
- [10]Gaujoux R, Seoighe C: A flexible R package for nonnegative matrix factorization. BMC Bioinformatics 2010, 11:367. [http://cran.r-project.org/web/packages/NMF webcite] BioMed Central Full Text
- [11]Qi Q, Zhao Y, Li M, Simon R: non-negative matrix factorization of gene expression profiles: A plug-in for BRB-ArrayTools. Bioinformatics 2009, 25(4):545-547.
- [12]Fertig E, Ding J, Favorov A, Parmigiani G, Ochs M: CoGAPS: an R/C++ package to identify patterns and biological process activity in transcriptomic data. Bioinformatics 2010, 26(21):2792-2793.
- [13]Ochs M, Fertig E: Matrix factorization for transcriptional regulatory network inference. CIBCB, IEEE CIS Society. Piscataway: IEEE Press; 2012, 387-396.
- [14]Lee D, Seung S: Algorithms for non-negative matrix factorization. In NIPS. Cambridge: MIT Press; 2001:556-562.
- [15]Kim H, Park H: Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM J Matrix Anal Appl 2008, 30(2):713-730.
- [16]Ding C, Li T, Jordan MI: Convex and semi-nonnegative matrix factorizations. TPAMI 2010, 32:45-55.
- [17]Tibshirani R: Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 1996, 58:267-288.
- [18]Zou H, Hastie T: Regularization and variable selection via the elastic Net. J R Stat Soc- Ser B: Stat Methodol 2005, 67(2):301-320.
- [19]Zhang D, Zhou Z, Chen S: Non-negative matrix factorization on kernels. LNCS 2006, 4099:404-412.
- [20]Ding C, Li T, Peng W, Park H: Orthogonal nonnegative matrix tri-factorizations for clustering. In KDD. New York: ACM; 2006:126-135.
- [21]Zass R, Shashua A: Non-negative sparse PCA. In NIPS. Cambridge: MIT Press; 2006.
- [22]Ho N: Nonnegative matrix factorization algorithms and applications. PhD thesis,Louvain-la-Neuve: Belgium; 2008
- [23]Madeira S, Oliveira A: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans Comput Biol Bioinformatics 2004, 1:24-45.
- [24]Kim P, Tidor B: Subsystem identification through dimensionality reduction of large-scale gene expression data. Genome Res 2003, 13:1706-1718.
- [25]Draghici S, Khatri P, Bhavsar P, Shah A, Krawetz S, Tainsky M: Onto-tools, the toolkit of the modern biologist: onto-express, onto-compare, onto-design and onto-translate. Nucleic Acids Res 2003, 31(13):3775-3781.
- [26]Mewes H, Frishman D, Gruber C, Geier B, Haase D, Kaps A, Lemcke K, Pfeiffer F, Schuller C, Stocker S, Mannhaupt G: MIPS: A database for genomes and protein sequences. Nucleic Acids Res 2000, 28:37-40.
- [27]Boyle E, Weng S, Gollub J, Jin H, Botstein D, Cherry J, Sherlock G: GO::TermFinder – open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes. Bioinformatics 2004, 20:3710-3715.
- [28]Sherman B, Huang D: Systematic and integrative analysis of large gene lists using DAVID Bioinformatics resources. Nature Protoc 2009, 4:44-57.
- [29]Huang D, Sherman B, Lempicki R: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 2009, 37:1-13.
- [30]Tu B, Kudlicki A, Rowicka M, McKnight S: Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes. Science 2005, 310:1152-1158.
- [31]Chandriani S, Frengen E, Cowling V, Pendergrass S, Perou C, Whitfield M, Cole M: A core MYC gene expression signature is prominient in basal-like breast cancer but only partially overlaps the core serum response. PloS ONE 2009, 4(5):e6693.
- [32]Ochs M, Rink L, Tarn C, Mburu S, Taguchi T, Eisenberg B, Godwin A: Detection of treatment-induced changes in signaling pathways in sastrointestinal stromal tumors using transcripttomic data. Cancer Res 2009, 69(23):9125-9132.
- [33]Khan J: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 2001, 7(6):673-679.
- [34]Hu Z: The molecular portraits of breast tumors are conserved across microarray platforms. BMC Genomics 2006, 7:96. BioMed Central Full Text
- [35]Li Y, Ngom A: Classification approach based on non-negative least squares. Neurocomputing 2013,. in press
- [36]Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman R: Missing value estimation methods for DNA microarrays. Bioinformatics 2001, 17(6):520-525.
- [37]Mukherjee S, Tamayo P, Rogers S, Rifkin R, Engle A, Campbell C, Golub T, Mesirov J: Estimating dataset size requirements for classifying DNA microarray data. J Comput Biol 2003, 10(2):119-142.
- [38]Demsar J: Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 2006, 7:1-30.