BMC Bioinformatics | |
Predicting co-complexed protein pairs using genomic and proteomic data integration | |
Lan V Zhang1  Sharyl L Wong1  Oliver D King1  Frederick P Roth1  | |
[1] Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA | |
关键词: machine learning; data integration; decision tree; protein complex; Protein-protein interaction; | |
Others : 1171754 DOI : 10.1186/1471-2105-5-38 |
|
received in 2003-11-03, accepted in 2004-04-16, 发布年份 2004 | |
【 摘 要 】
Background
Identifying all protein-protein interactions in an organism is a major objective of proteomics. A related goal is to know which protein pairs are present in the same protein complex. High-throughput methods such as yeast two-hybrid (Y2H) and affinity purification coupled with mass spectrometry (APMS) have been used to detect interacting proteins on a genomic scale. However, both Y2H and APMS methods have substantial false-positive rates. Aside from high-throughput interaction screens, other gene- or protein-pair characteristics may also be informative of physical interaction. Therefore it is desirable to integrate multiple datasets and utilize their different predictive value for more accurate prediction of co-complexed relationship.
Results
Using a supervised machine learning approach – probabilistic decision tree, we integrated high-throughput protein interaction datasets and other gene- and protein-pair characteristics to predict co-complexed pairs (CCP) of proteins. Our predictions proved more sensitive and specific than predictions based on Y2H or APMS methods alone or in combination. Among the top predictions not annotated as CCPs in our reference set (obtained from the MIPS complex catalogue), a significant fraction was found to physically interact according to a separate database (YPD, Yeast Proteome Database), and the remaining predictions may potentially represent unknown CCPs.
Conclusions
We demonstrated that the probabilistic decision tree approach can be successfully used to predict co-complexed protein (CCP) pairs from other characteristics. Our top-scoring CCP predictions provide testable hypotheses for experimental validation.
【 授权许可】
2004 Zhang et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150420015230288.pdf | 494KB | download | |
Figure 5. | 45KB | Image | download |
Figure 4. | 39KB | Image | download |
Figure 3. | 73KB | Image | download |
Figure 2. | 90KB | Image | download |
Figure 1. | 43KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
【 参考文献 】
- [1]Claverie JM: Gene number. What if there are only 30,000 human genes? Science 2001, 291:1255-1257.
- [2]Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg JM: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 2000, 403:623-627.
- [3]Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci U S A 2001, 98:4569-4574.
- [4]Ito T, Tashiro K, Muta S, Ozawa R, Chiba T, Nishizawa M, Yamamoto K, Kuhara S, Sakaki Y: Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Natl Acad Sci U S A 2000, 97:1143-1147.
- [5]von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 2002, 417:399-403.
- [6]Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, Remor M, Hofert C, Schelder M, Brajenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier MA, Copley RR, Edelmann A, Querfurth E, Rybin V, Drewes G, Raida M, Bouwmeester T, Bork P, Seraphin B, Kuster B, Neubauer G, Superti-Furga G: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415:141-147.
- [7]Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K, Yang L, Wolting C, Donaldson I, Schandorff S, Shewnarane J, Vo M, Taggart J, Goudreault M, Muskat B, Alfarano C, Dewar D, Lin Z, Michalickova K, Willems AR, Sassi H, Nielsen PA, Rasmussen KJ, Andersen JR, Johansen LE, Hansen LH, Jespersen H, Podtelejnikov A, Nielsen E, Crawford J, Poulsen V, Sorensen BD, Matthiesen J, Hendrickson RC, Gleeson F, Pawson T, Moran MF, Durocher D, Mann M, Hogue CW, Figeys D, Tyers M: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002, 415:180-183.
- [8]Jansen R, Greenbaum D, Gerstein M: Relating whole-genome expression data with protein-protein interactions. Genome Res. 2002, 12:37-46.
- [9]Ge H, Liu Z, Church GM, Vidal M: Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat. Genet. 2001, 29:482-486.
- [10]Deane CM, Salwinski L, Xenarios I, Eisenberg D: Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol. Cell Proteomics 2002, 1:349-356.
- [11]Kemmeren P, van Berkum NL, Vilo J, Bijma T, Donders R, Brazma A, Holstege FC: Protein interaction verification and functional annotation by integrated analysis of genome-scale data. Mol. Cell 2002, 9:1133-1143.
- [12]Grigoriev A: A relationship between gene expression and protein interactions on the proteome scale: analysis of the bacteriophage T7 and the yeast Saccharomyces cerevisiae. Nucleic Acids Res 2001, 29:3513-3519.
- [13]Ge H, Walhout AJ, Vidal M: Integrating 'omic' information: a bridge between genomics and systems biology. Trends Genet 2003, 19:551-560.
- [14]Schwikowski B, Uetz P, Fields S: A network of protein-protein interactions in yeast. Nat. Biotechnol. 2000, 18:1257-1261.
- [15]Xenarios I, Eisenberg D: Protein interaction databases. Curr. Opin. Biotechnol. 2001, 12:334-339.
- [16]Hazbun TR, Fields S: Networking proteins in yeast. Proc. Natl. Acad. Sci. U S A 2001, 98:4277-4278.
- [17]Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D: Detecting protein function and protein-protein interactions from genome sequences. Science 1999, 285:751-753.
- [18]Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D: A combined algorithm for genome-wide prediction of protein function. Nature 1999, 402:83-86.
- [19]Pavlidis P, Weston J, Cai J, Noble WS: Learning gene functional classifications from multiple data types. J Comput Biol 2002, 9:401-411.
- [20]Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 2003, 302:449-453.
- [21]Troyanskaya OG, Dolinski K, Owen AB, Altman RB, Botstein D: A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Proc Natl Acad Sci U S A 2003, 100:8348-8353.
- [22]King OD, Foulger RE, Dwight SS, White JV, Roth FP: Predicting gene function from patterns of annotation. Genome Res. 2003, 13:896-904.
- [23]King OD, Lee JC, Dudley AM, Janse DM, Church GM, Roth FP: Predicting phenotype from patterns of annotation. Bioinformatics 2003, 19 Suppl 1:I183-I189.
- [24]MIPS complex catalogue [http://mips.gsf.de/proj/yeast/catalogues/complexes/index.html] webcite
- [25]Mewes HW, Frishman D, Guldener U, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, Munsterkotter M, Rudd S, Weil B: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 2002, 30:31-34.
- [26]Csank C, Costanzo MC, Hirschman J, Hodges P, Kranz JE, Mangan M, O'Neill K, Robertson LS, Skrzypek MS, Brooks J, Garrels JI: Three yeast proteome databases: YPD, PombePD, and CalPD (MycoPathPD). Methods Enzymol 2002, 350:347-373.
- [27]Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 2002, 298:799-804.
- [28]Hermann-Le Denmat S, Werner M, Sentenac A, Thuriaux P: Suppression of yeast RNA polymerase III mutations by FHL1, a gene coding for a fork head protein involved in rRNA processing. Mol Cell Biol 1994, 14:2905-2913.
- [29]Dragon F, Gallagher JE, Compagnone-Post PA, Mitchell BM, Porwancher KA, Wehner KA, Wormsley S, Settlage RE, Shabanowitz J, Osheim Y, Beyer AL, Hunt DF, Baserga SJ: A large nucleolar U3 ribonucleoprotein required for 18S ribosomal RNA biogenesis. Nature 2002, 417:967-970.
- [30]Bader GD, Hogue CW: Analyzing yeast protein-protein interaction data obtained from different sources. Nat. Biotechnol. 2002, 20:991-997.
- [31]Bader GD, Hogue CW: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 2003, 4:2. BioMed Central Full Text
- [32]Conti E, Uy M, Leighton L, Blobel G, Kuriyan J: Crystallographic analysis of the recognition of a nuclear localization signal by the nuclear import factor karyopherin alpha. Cell 1998, 94:193-204.
- [33]Ni J, Tien AL, Fournier MJ: Small nucleolar RNAs direct site-specific synthesis of pseudouridine in ribosomal RNA. Cell 1997, 89:565-573.
- [34]Hess SM, Stanford DR, Hopper AK: SRD1, a S. cerevisiae gene affecting pre-rRNA processing contains a C2/C2 zinc finger motif. Nucleic Acids Res 1994, 22:1265-1271.
- [35]Bader Joel S.: Greedily building protein networks with confidence. Bioinformatics 2003, 19:1869-1874.
- [36]Asthana S, King OD, Roth FP: Predicting protein complex membership using probabilistic network reliability. Genome Res, in press.
- [37]Gerstein M, Lan N, Jansen R: Proteomics. Integrating interactomes. Science 2002, 295:284-287.
- [38]Jansen R, Lan N, Qian J, Gerstein M: Integration of genomic datasets to predict protein complexes in yeast. J Structural and Functional Genomics 2002, 71-81.
- [39]Vogel DS, Axelrod RC: Predicting the effects of gene deletion. SIGKDD Explorations 2002, 4:101.
- [40]Quinlan JR: C4.5 : programs for machine learning. In Morgan Kaufmann series in machine learning. San Mateo, Calif., Morgan Kaufmann Publishers; 1993:x, 302.
- [41]Breiman L, Friedman JH, Olshen RA, Stone CJ: Classification and regression trees. In Wadsworth statistics/probability series. Belmont, Calif., Wadsworth International Group; 1984:x, 358.
- [42]Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, Kidd MJ, King AM, Meyer MR, Slade D, Lum PY, Stepaniants SB, Shoemaker DD, Gachotte D, Chakraburtty K, Simon J, Bard M, Friend SH: Functional discovery via a compendium of expression profiles. Cell 2000, 102:109-126.
- [43]Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, Landsman D, Lockhart DJ, Davis RW: A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 1998, 2:65-73.
- [44]Kumar A, Agarwal S, Heyman JA, Matson S, Heidtman M, Piccirillo S, Umansky L, Drawid A, Jansen R, Liu Y, Cheung KH, Miller P, Gerstein M, Roeder GS, Snyder M: Subcellular localization of the yeast proteome. Genes Dev 2002, 16:707-719.
- [45]Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25:3389-3402.