期刊论文详细信息
Algorithms for Molecular Biology
Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes
Yiannis AI Kourmpetis1  Aalt DJ van Dijk3  Cajo JF ter Braak2 
[1] Current address: Functional Genomics, Nestlé Institute of Health Sciences, Campus EPFL, Quartier de l’Innovation, 1015 Lausanne, Switzerland
[2] Biometris, Wageningen University and Research Centre, 6700AC Wageningen, The Netherlands
[3] Applied Bioinformatics, Plant Research International, Wageningen University and Research Centre, 6700AC Wageningen, The Netherlands
关键词: Evolutionary optimization;    Gene Ontology;    Protein function prediction;   
Others  :  793430
DOI  :  10.1186/1748-7188-8-10
 received in 2011-08-10, accepted in 2013-03-04,  发布年份 2013
PDF
【 摘 要 】

Gene Ontology (GO) is a hierarchical vocabulary for the description of biological functions and locations, often employed by computational methods for protein function prediction. Due to the structure of GO, function predictions can be self- contradictory. For example, a protein may be predicted to belong to a detailed functional class, but not in a broader class that, due to the vocabulary structure, includes the predicted one.

We present a novel discrete optimization algorithm called Functional Annotation with Labeling CONsistency (FALCON) that resolves such contradictions. The GO is modeled as a discrete Bayesian Network. For any given input of GO term membership probabilities, the algorithm returns the most probable GO term assignments that are in accordance with the Gene Ontology structure. The optimization is done using the Differential Evolution algorithm. Performance is evaluated on simulated and also real data from Arabidopsis thaliana showing improvement compared to related approaches. We finally applied the FALCON algorithm to obtain genome-wide function predictions for six eukaryotic species based on data provided by the CAFA (Critical Assessment of Function Annotation) project.

【 授权许可】

   
2013 Kourmpetis et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140705051541192.pdf 1218KB PDF download
Figure 4. 100KB Image download
Figure 3. 49KB Image download
Figure 2. 56KB Image download
Figure 1. 57KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

【 参考文献 】
  • [1]Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: Tool for the unification of biology. Nat Genet 2000, 25:25-29.
  • [2]Obozinski G, Lanckriet G, Grant C, Jordan MI, Noble WS: Consistent probabilistic outputs for protein function prediction. Genome Biol 2008, 9(Suppl 1):S2(SUPPL. 1).
  • [3]Burdakov O, Grimvall A, Sysoev O: Data preordering in generalized PAV algorithm for monotonic regression. J Comput Math 2006, 24(6):771-790.
  • [4]Burdakov O, Sysoev O, Grimvall A, Hussian M: An O (n 2) algorithm for isotonic regression. Large-Scale Nonlinear Optimization 2006, 83:25-33.
  • [5]Viterbi A et al.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 1967, 13(2):260-269.
  • [6]MacKay DJC: Information Theory, Inference & Learning Algorithms. New York: Cambridge University Press; 2002.
  • [7]Kourmpetis Y, van der Burgt A, Bink M, ter Braak C, van Ham R: The use of multiple hierarchically independent Gene Ontology terms in gene function prediction and genome annotation. In Silico Biol 2007, 7(6):575-582.
  • [8]Vens C, Struyf J, Schietgat L, Džeroski S, Blockeel H: Decision trees for hierarchical multi-label classification. Mach Learn 2008, 73(2):185-214.
  • [9]Glez-Peña D, Álvarez R, Díaz F Fdez-Riverola: DFP: A Bioconductor package for fuzzy profile identification and gene reduction of microarray data. BMC Bioinformatics 2009, 10:37. BioMed Central Full Text
  • [10]Jiang X, Nariai N, Steffen M, Kasif S, Kolaczyk E: Integration of relational and hierarchical network information for protein function prediction. BMC Bioinformatics 2008, 9:350. BioMed Central Full Text
  • [11]Lauritzen S, Spiegelhalter D: Local computations with probabilities on graphical structures and their application to expert systems. J R Stat Soc Ser B (Methodological) 1988, 50(2):157-224.
  • [12]Barutcuoglu Z, Schapire RE, Troyanskaya OG: Hierarchical multi-label prediction of gene function. Bioinformatics 2006, 22(7):830-836.
  • [13]Sokolov A, Ben-Hur A: Hierarchical classification of gene ontology terms using the Gostruct method. J Bioinformatics Comput Biol 2010, 8(2):357-376.
  • [14]Valentini G: True Path Rule hierarchical ensembles for genome-wide gene function prediction. Comput Biol Bioinformatics, IEEE/ACM Trans 2011, 8(3):832-847.
  • [15]Cesa-Bianchi N, Re M, Valentini G: Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference. Mach Learn 2011, 88:1-33.
  • [16]Storn R, Price K: Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optimization 1997, 11(4):341-359.
  • [17]Kourmpetis Y, van Dijk A, Bink M, van Ham R, Ter Braak C: Bayesian Markov Random Field analysis for protein function prediction based on network data. PloS ONE 2010, 5(2):e9293.
  • [18]Kourmpetis Y, van Dijk A, van Ham R, ter Braak C: Genome-wide computational function prediction of Arabidopsis proteins by integration of multiple data sources. Plant Physiol 2011, 155:271-281.
  • [19]Mostafavi S, Ray D, Warde-Farley D, Grouios C, Morris Q: GeneMANIA: A real-time multiple association network integration algorithm for predicting gene function. Genome Biol 2008, 9(Suppl 1):S4(SUPPL. 1).
  • [20]Lee H, Tu Z, Deng M, Sun F, Chen T: Diffusion kernel-based logistic regression models for protein function prediction. OMICS 2006, 10:40-55.
  • [21]Conesa A, Gotz S, Garcia-Gomez J, Terol J, Talón M, Robles M: Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 2005, 21(18):3674.
  • [22]Bradford J, Needham C, Tedder P, Care M, Bulpitt A, Westhead D: GO-At: in silico prediction of gene function in Arabidopsis thaliana by combining heterogeneous data. Plant J 2010, 61(4):713-721.
  • [23]Lee I, Ambaru B, Thakkar P, Marcotte E, Rhee S: Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana. Nat Biotechnol 2010, 28(2):149-156.
  • [24]Strens M: Evolutionary MCMC sampling and optimization in discrete spaces. Proceedings of the Twentieth International Conference on Machine Learning ICML 2003. http://www.aaai.org/Papers/ICML/2003/ICML03-096.pdf webcite
  • [25]Csardi G, Nepusz T: The igraph software package for complex network research. InterJournal 2006, Complex Systems:1695. http://igraph.sf.net webcite
  • [26]Radivojac P, Clark W, Oron T, Schnoes A, Wittkop T, Sokolov A: A large-scale evaluation of computational protein function prediction. Nature Methods 2013, 10:221-227.
  • [27]Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, Doerks T, Stark M, Muller J, Bork P et al.: The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 2011, 39(suppl 1):D561—D568.
  • [28]Kuzniar A, Lin K, He Y, Nijveen H, Pongor S, Leunissen J: ProGMap: an integrated annotation resource for protein orthology. Nucleic Acids Res 2009, 37(suppl 2):W428—W434.
  文献评价指标  
  下载次数:26次 浏览次数:12次