期刊论文详细信息
BMC Research Notes
Human gene correlation analysis (HGCA): A tool for the identification of transcriptionally co-expressed genes
Sophia Kossida4  Reinhard Schneider2  Myrto-Areti Kostadima4  Alexandros Karelas3  Apostolos Malatras1  Georgios A Pavlopoulos5  Ioannis Michalopoulos3 
[1] Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, Athens, 15701, Greece;Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, avenue des Hauts-Fourneaux 7, Esch sur Alzette, L-4362, Luxembourg;Cryobiology of Stem Cells, Centre of Immunology and Transplantation, Biomedical Research Foundation, Academy of Athens, Soranou Efessiou 4, Athens, 11527, Greece;Bioinformatics & Medical Informatics Team, Biomedical Research Foundation, Academy of Athens, Soranou Efessiou 4, Athens, 11527, Greece;ESAT-SCD/IBBT-K.U. Leuven Future Health Department, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Heverlee-Leuven, 3001, Belgium
关键词: Functional annotation;    Gene coexpression;    Gene annotation;    Microarray analysis;   
Others  :  1166376
DOI  :  10.1186/1756-0500-5-265
 received in 2011-12-24, accepted in 2012-05-24,  发布年份 2012
PDF
【 摘 要 】

Background

Bioinformatics and high-throughput technologies such as microarray studies allow the measure of the expression levels of large numbers of genes simultaneously, thus helping us to understand the molecular mechanisms of various biological processes in a cell.

Findings

We calculate the Pearson Correlation Coefficient (r-value) between probe set signal values from Affymetrix Human Genome Microarray samples and cluster the human genes according to the r-value correlation matrix using the Neighbour Joining (NJ) clustering method. A hyper-geometric distribution is applied on the text annotations of the probe sets to quantify the term overrepresentations. The aim of the tool is the identification of closely correlated genes for a given gene of interest and/or the prediction of its biological function, which is based on the annotations of the respective gene cluster.

Conclusion

Human Gene Correlation Analysis (HGCA) is a tool to classify human genes according to their coexpression levels and to identify overrepresented annotation terms in correlated gene groups. It is available at: http://biobank-informatics.bioacademy.gr/coexpression/ webcite.

【 授权许可】

   
2012 Michalopoulos et al.: licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150416043548325.pdf 927KB PDF download
Figure 6. 25KB Image download
Figure 5. 33KB Image download
Figure 4. 24KB Image download
Figure 3. 37KB Image download
Figure 2. 78KB Image download
Figure 1. 70KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

【 参考文献 】
  • [1]Jen CH, Manfield IW, Michalopoulos I, Pinney JW, Willats WG, Gilmartin PM, Westhead DR: The Arabidopsis co-expression tool (ACT): a WWW-based tool and database for microarray-based gene expression analysis. Plant J 2006, 46:336-348.
  • [2]Manfield IW, Jen CH, Pinney JW, Michalopoulos I, Bradford JR, Gilmartin PM, Westhead DR: Arabidopsis Co-expression Tool (ACT): web server tools for microarray-based gene expression analysis. Nucleic Acids Res 2006, 34:W504-W509.
  • [3]Toufighi K, Brady SM, Austin R, Ly E, Provart NJ: The Botany Array Resource: e-Northerns, Expression Angling, and promoter analyses. Plant J 2005, 43:153-163.
  • [4]Obayashi T, Hayashi S, Saeki M, Ohta H, Kinoshita K: ATTED-II provides coexpressed gene networks for Arabidopsis. Nucleic Acids Res 2009, 37:D987-D991.
  • [5]Obayashi T, Kinoshita K, Nakai K, Shibaoka M, Hayashi S, Saeki M, Shibata D, Saito K, Ohta H: ATTED-II: a database of co-expressed genes and cis elements for identifying co-regulated gene groups in Arabidopsis. Nucleic Acids Res 2007, 35:D863-D869.
  • [6]Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W: GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiol 2004, 136:2621-2632.
  • [7]Steinhauser D, Usadel B, Luedemann A, Thimm O, Kopka J: CSB.DB: a comprehensive systems-biology database. Bioinformatics 2004, 20:3647-3651.
  • [8]Usadel B, Obayashi T, Mutwil M, Giorgi FM, Bassel GW, Tanimoto M, Chow A, Steinhauser D, Persson S, Provart NJ: Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. Plant Cell Environ 2009, 32:1633-1651.
  • [9]Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, et al.: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A 2004, 101:6062-6067.
  • [10]Obayashi T, Hayashi S, Shibaoka M, Saeki M, Ohta H, Kinoshita K: COXPRESdb: a database of coexpressed gene networks in mammals. Nucleic Acids Res 2008, 36:D77-D82.
  • [11]Lee PD, Sladek R, Greenwood CM, Hudson TJ: Control genes and variability: absence of ubiquitous reference transcripts in diverse mammalian expression studies. Genome Res 2002, 12:292-297.
  • [12]Prieto C, Risueno A, Fontanillo C, De las Rivas J: Human gene coexpression landscape: confident network derived from tissue transcriptomic profiles. PLoS One 2008, 3:e3911.
  • [13]Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P: Coexpression analysis of human genes across many microarray data sets. Genome Res 2004, 14:1085-1094.
  • [14]French L, Lane S, Law T, Xu L, Pavlidis P: Application and evaluation of automated semantic annotation of gene expression experiments. Bioinformatics 2009, 25:1543-1549.
  • [15]Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Marshall KA, et al.: NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 2009, 37:D885-D890.
  • [16]Langfelder P, Horvath S: WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008, 9:559. BioMed Central Full Text
  • [17]Horvath S, Zhang B, Carlson M, Lu KV, Zhu S, Felciano RM, Laurance MF, Zhao W, Qi S, Chen Z, et al.: Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target. Proc Natl Acad Sci U S A 2006, 103:17402-17407.
  • [18]Oldham MC, Konopka G, Iwamoto K, Langfelder P, Kato T, Horvath S, Geschwind DH: Functional organization of the transcriptome in human brain. Nat Neurosci 2008, 11:1271-1282.
  • [19]Miller JA, Oldham MC, Geschwind DH: A systems level analysis of transcriptional changes in Alzheimer’s disease and normal aging. J Neurosci 2008, 28:1410-1420.
  • [20]Keller MP, Choi Y, Wang P, Davis DB, Rabaglia ME, Oler AT, Stapleton DS, Argmann C, Schueler KL, Edwards S, et al.: A gene expression network model of type 2 diabetes links cell cycle regulation in islets with diabetes susceptibility. Genome Res 2008, 18:706-716.
  • [21]Oldham MC, Horvath S, Geschwind DH: Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc Natl Acad Sci U S A 2006, 103:17973-17978.
  • [22]Presson AP, Sobel EM, Papp JC, Suarez CJ, Whistler T, Rajeevan MS, Vernon SD, Horvath S: Integrated weighted gene co-expression network analysis with an application to chronic fatigue syndrome. BMC Syst Biol 2008, 2:95. BioMed Central Full Text
  • [23]Wu C, Orozco C, Boyer J, Leglise M, Goodale J, Batalov S, Hodge CL, Haase J, Janes J, Huss JW, et al.: BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol 2009, 10:R130. BioMed Central Full Text
  • [24]Theocharidis A, van Dongen S, Enright AJ, Freeman TC: Network visualization and analysis of gene expression data using BioLayout Express(3D). Nat Protoc 2009, 4:1535-1550.
  • [25]Breitling R, Sharif O, Hartman ML, Krisans SK: Loss of compartmentalization causes misregulation of lysine biosynthesis in peroxisome-deficient yeast cells. Eukaryot Cell 2002, 1:978-986.
  • [26]Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM: Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Res 2002, 62:4427-4433.
  • [27]Yuen T, Wurmbach E, Pfeffer RL, Ebersole BJ, Sealfon SC: Accuracy and calibration of commercial oligonucleotide and custom cDNA microarrays. Nucleic Acids Res 2002, 30:e48.
  • [28]Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, Deng S, Johnsen H, Pesich R, Geisler S, et al.: Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 2003, 100:8418-8423.
  • [29]Ramaswamy S, Ross KN, Lander ES, Golub TR: A molecular signature of metastasis in primary solid tumors. Nat Genet 2003, 33:49-54.
  • [30]Xin W, Rhodes DR, Ingold C, Chinnaiyan AM, Rubin MA: Dysregulation of the annexin family protein family is associated with prostate cancer progression. Am J Pathol 2003, 162:255-261.
  • [31]Greenbaum D, Luscombe NM, Jansen R, Qian J, Gerstein M: Interrelating different types of genomic data, from proteome to secretome: 'oming in on function. Genome Res 2001, 11:1463-1468.
  • [32]Kemmeren P, van Berkum NL, Vilo J, Bijma T, Donders R, Brazma A, Holstege FC: Protein interaction verification and functional annotation by integrated analysis of genome-scale data. Mol Cell 2002, 9:1133-1143.
  • [33]von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 2002, 417:399-403.
  • [34]Franke L, van Bakel H, Fokkens L, de Jong ED, Egmont-Petersen M, Wijmenga C: Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet 2006, 78:1011-1025.
  • [35]Cheng WC, Tsai ML, Chang CW, Huang CL, Chen CR, Shu WY, Lee YS, Wang TH, Hong JH, Li CY, et al.: Microarray meta-analysis database (M(2)DB): a uniformly pre-processed, quality controlled, and manually curated human clinical microarray database. BMC Bioinformatics 2010, 11:421. BioMed Central Full Text
  • [36]Affymetrix Power Tools [http:/ / www.affymetrix.com/ partners_programs/ programs/ developer/ tools/ powertools.affx] webcite
  • [37]Brayer K, Hammond JL: Evaluation of error detection polynomial performance on the AUTOVON channel. In IEEE National Telecommunications Conference. Institute of Electrical and Electronics Engineers, New Orleans, LA; 1975. 8–21 to 28–25
  • [38]Hubbell E, Liu WM, Mei R: Robust estimators for expression analysis. Bioinformatics 2002, 18:1585-1592.
  • [39]Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L, et al.: Ensembl 2009. Nucleic Acids Res 2009, 37:D690-D697.
  • [40]Gene ontology: tool for the unification of biology. The Gene Ontology Consortium Nat Genet 2000, 25:25-29.
  • [41]Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 2006, 34:D354-D357.
  • [42]Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, et al.: InterPro: the integrative protein signature database. Nucleic Acids Res 2009, 37:D211-D215.
  • [43]Boyadjiev SA, Jabs EW: Online Mendelian Inheritance in Man (OMIM) as a knowledgebase for human developmental disorders. Clin Genet 2000, 57:253-266.
  • [44]Hamosh A, Scott AF, Amberger J, Valle D, McKusick VA: Online Mendelian Inheritance in Man (OMIM). Hum Mutat 2000, 15:57-61.
  • [45]Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV, et al.: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 2003, 31:374-378.
  • [46]Kel AE, Gossling E, Reuter I, Cheremushkin E, Kel-Margoulis OV, Wingender E: MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res 2003, 31:3576-3579.
  • [47]Rodgers JL, Nicewander WA: Thirteen Ways to Look at the Correlation Coefficient. Am Stat 1988, 42:59-66.
  • [48]Kendall M, Stuart A, Ord J: The Advanced Theory of Statistics. 4th edition. Charles Griffin, London; 1977.
  • [49]Shannon CE, Weaver W: The Mathematical Theory of Communication. University of Illinois Press, Urbana, IL; 1949.
  • [50]Bonferroni CE: Il calcolo delle assicurazioni su gruppi di teste. In Studi in Onore del Professore Salvatore Ortu Carboni. Rome, Italy; 1935:13-60.
  • [51]Distance matrix programs [http://evolution.genetics.washington.edu/phylip/doc/distance.html] webcite
  • [52]Saitou N, Nei M: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 1987, 4:406-425.
  • [53]Mihaescu R, Levy D, Pachter L: Why Neighbor-Joining Works. Algorithmica 2009, 54:1-24.
  • [54]Mailund T, Pedersen CN: QuickJoin–fast neighbour-joining tree reconstruction. Bioinformatics 2004, 20:3261-3262.
  • [55]Pavlopoulos GA, Soldatos TG, Barbosa-Silva A, Schneider R: A reference guide for tree analysis and visualization. BioData Min 2010, 3:1. BioMed Central Full Text
  • [56]Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc Ser B 1995, 57:289-300.
  • [57]Chvátal V: The tail of the hypergeometric distribution. Discrete Math 1979, 25:285-287.
  • [58]Ihmels J, Bergmann S, Berman J, Barkai N: Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 2005, 1:e39.
  • [59]Tanay A, Regev A, Shamir R: Conservation and evolvability in regulatory networks: the evolution of ribosomal regulation in yeast. Proc Natl Acad Sci U S A 2005, 102:7203-7208.
  • [60]Murphy BJ, Kimura T, Sato BG, Shi Y, Andrews GK: Metallothionein induction by hypoxia involves cooperative interactions between metal-responsive transcription factor-1 and hypoxia-inducible transcription factor-1alpha. Mol Cancer Res 2008, 6:483-490.
  • [61]Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, Doerks T, Stark M, Muller J, Bork P, et al.: The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 2011, 39:D561-D568.
  • [62]Paus R, Cotsarelis G: The biology of hair follicles. N Engl J Med 1999, 341:491-497.
  • [63]Garza LA, Liu Y, Yang Z, Alagesan B, Lawson JA, Norberg SM, Loy DE, Zhao T, Blatt HB, Stanton DC, et al.: Prostaglandin d2 inhibits hair growth and is elevated in bald scalp of men with androgenetic alopecia. Sci Transl Med 2012, 4:126-134.
  • [64]Polanco JC, Wilhelm D, Davidson TL, Knight D, Koopman P: Sox10 gain-of-function causes XX sex reversal in mice: implications for human 22q-linked disorders of sex development. Hum Mol Genet 2010, 19:506-516.
  • [65]Kuramochi-Miyagawa S, Kimura T, Ijiri TW, Isobe T, Asada N, Fujita Y, Ikawa M, Iwai N, Okabe M, Deng W, et al.: Mili, a mammalian member of piwi family gene, is essential for spermatogenesis. Development 2004, 131:839-849.
  • [66]Costa Y, Speed RM, Gautier P, Semple CA, Maratou K, Turner JM, Cooke HJ: Mouse MAELSTROM: the link between meiotic silencing of unsynapsed chromatin and microRNA pathway? Hum Mol Genet 2006, 15:2324-2334.
  • [67]Saitou N, Imanishi T: Relative efficiencies of the fitch-margoliash, maximum-parsimony, maximum-likelihood, minimum-evolution, and neighbor-joining methods of phylogenetic tree construction in obtaining the correct tree. Mol Biol Evol 1989, 6:514-525.
  • [68]Huson DH, Richter DC, Rausch C, Dezulian T, Franz M, Rupp R: Dendroscope: An interactive viewer for large phylogenetic trees. BMC Bioinformatics 2007, 8:460. BioMed Central Full Text
  • [69]Letunic I, Bork P: Interactive tree of life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 2007, 23:127-128.
  • [70]Freeman TC, Goldovsky L, Brosch M, van Dongen S, Maziere P, Grocock RJ, Freilich S, Thornton J, Enright AJ: Construction, visualisation, and clustering of transcription networks from microarray expression data. PLoS Comp Biol 2007, 3:2032-2042.
  • [71]Enright AJ, Van Dongen S, Ouzounis CA: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 2002, 30:1575-1584.
  文献评价指标  
  下载次数:51次 浏览次数:28次