BMC Bioinformatics | |
GOParGenPy: a high throughput method to generate Gene Ontology data matrices | |
Ajay Anand Kumar2  Liisa Holm1  Petri Toronen2  | |
[1] Department of Biosciences, Division of Genetics, University of Helsinki, (Viikinkaari 5), PO Box 56, Helsinki 00014, Finland | |
[2] Institute of Biotechnology, University of Helsinki, (Viikinkaari 5), PO Box 56, Helsinki 00014, Finland | |
关键词: Bioinformatics; Machine learning; Data Mining; Large-scale datasets; Gene Ontology; | |
Others : 1087794 DOI : 10.1186/1471-2105-14-242 |
|
received in 2013-03-21, accepted in 2013-07-11, 发布年份 2013 | |
![]() |
【 摘 要 】
Background
Gene Ontology (GO) is a popular standard in the annotation of gene products and provides information related to genes across all species. The structure of GO is dynamic and is updated on a daily basis. However, the popular existing methods use outdated versions of GO. Moreover, these tools are slow to process large datasets consisting of more than 20,000 genes.
Results
We have developed GOParGenPy, a platform independent software tool to generate the binary data matrix showing the GO class membership, including parental classes, of a set of GO annotated genes. GOParGenPy is at least an order of magnitude faster than popular tools for Gene Ontology analysis and it can handle larger datasets than the existing tools. It can use any available version of the GO structure and allows the user to select the source of GO annotation. GO structure selection is critical for analysis, as we show that GO classes have rapid turnover between different GO structure releases.
Conclusions
GOParGenPy is an easy to use software tool which can generate sparse or full binary matrices from GO annotated gene sets. The obtained binary matrix can then be used with any analysis environment and with any analysis methods.
【 授权许可】
2013 Kumar et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150117043722131.pdf | 789KB | ![]() |
|
Figure 5. | 49KB | Image | ![]() |
Figure 4. | 49KB | Image | ![]() |
Figure 3. | 47KB | Image | ![]() |
Figure 2. | 45KB | Image | ![]() |
Figure 1. | 49KB | Image | ![]() |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
【 参考文献 】
- [1]Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: The gene ontology consortium. Nat Genet 2000, 25(1):25-29.
- [2]Khatri P, Drăghici S: Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 2005, 21(18):3587-3595.
- [3]Lord PW, Stevens RD, Brass A, Goble CA: Semantic similarity measures as tools for exploring the gene ontology. Pac Symp Biocomput 2003, 601-612. ISSN 1793–5091
- [4]Törönen P, Ojala PJ, Marttinen P, Holm L: Robust extraction of functional signals from gene set analysis using a generalized threshold free scoring function. BMC Bioinforma 2009, 10:307. BioMed Central Full Text
- [5]Ackermann M, Strimmer K: A general modular framework for gene set enrichment analysis. BMC Bioinforma 2009, 10:47. BioMed Central Full Text
- [6]Peña-Castillo L, Tasan M, Myers CL, Lee H, Joshi T, Zhang C, Guan Y, Leone M, Pagnani A, Kim WK, Krumpelman C, Tian W, Obozinski G, Qi Y, Mostafavi S, Lin GN, Berriz GF, Gibbons FD, Lanckriet G, Qiu J, Grant C, Barutcuoglu Z, Hill DP, Warde-Farley D, Grouios C, Ray D, Blake JA, Deng M, Jordan MI, Noble WS, et al.: A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol 2008, 9(Suppl 1):S2. BioMed Central Full Text
- [7]Radivojac P, et al.: A large-scale evaluation of computational protein function prediction. Nat Methods 2013, 10(3):221-7. Epub 2013 Jan 27
- [8]Nikkilä J, Törönen P, Kaski S, Venna J, Castrén E, Wong G: Analysis and visualization of gene expression data using Self-Organizing Maps. Neural Netw 2002, 15(8–9):953-966.
- [9]Beissbarth T, Speed TP: GOstat: Find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 2004, 20(9):1464-1465.
- [10]Carlson M, Falcon S, Pages H, Li N: AnnotationDbi: Annotation Database Interface. R package version 1.12.0
- [11]Carlson M, Falcon S, Pages H, Li N: GO.db: A set of annotation maps describing the entire Gene Ontology. R package version 2.5
- [12]da-Huang W, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009, 4(1):44-57.
- [13]Du Z, Zhou X, Ling Y, Zhang Z, Su Z: agriGO: a GO analysis toolkit for the agricultural community. Nucl Acids Res 2010, 38:W64-W70.
- [14]Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, Lehväslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E: The Bioperl toolkit: perl modules for the life sciences. Genome Res 2002, 12(10):1611-1618.
- [15]UNIPROT - GOA data set download link: http://www.ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT