期刊论文详细信息
BMC Bioinformatics
A ratiometric-based measure of gene co-expression
Anna CT Abelin1  Georgi K Marinov1  Brian A Williams1  Kenneth McCue1  Barbara J Wold1 
[1] Division of Biology and Biological Engineering, California Institute of Technology, 1200 East California Blvd, Pasadena, CA 91125, USA
关键词: RA;    Pearson correlation;    Mutual information;    Single-cell RNA-seq;    RNA-seq;    Transcriptome analysis;    Ratiometric analysis;    Gene expression analysis;   
Others  :  1085039
DOI  :  10.1186/1471-2105-15-331
 received in 2014-03-12, accepted in 2014-07-18,  发布年份 2014
PDF
【 摘 要 】

Background

Gene co-expression analysis has previously been based on measures that include correlation coefficients and mutual information, as well as newcomers such as MIC. These measures depend primarily on the degree of association between the RNA levels of two genes and to a lesser extent on their variability. They focus on the similarity of expression value trajectories that change in like manner across samples. However there are relationships of biological interest for which these classical measures are expected to be insensitive. These include genes whose expression levels are ratiometrically stable and genes whose variance is tightly constrained. Large-scale studies of relatively homogeneous samples, including single cell RNA-seq, are experimental settings in which such relationships might be especially pertinent.

Results

We develop and implement a ratiometric approach for detecting gene associations (abbreviated RA). It is based on the coefficient of variation of the measured expression ratio of each pair of genes. We apply it to a collection of lymphoblastoid RNA-seq data from the 1000 Genomes Project Consortium, a typical sample set with high overall homogeneity. RA is a selective method, reporting in this case ~1/4 of all possible gene pairs, yet these relationships include a distilled picture of biological relationships previously found by other methods. In addition, RA reveals expression relationships that are not detected by traditional correlation and mutual information methods. We also analyze data from individual lymphoblastoid cells and show that desirable properties of the RA method extend to single-cell RNA-seq.

Conclusion

We show that our ratiometric method identifies biologically significant relationships that are often missed or low-ranked by conventional association-based methods when applied to a relatively homogenous dataset. The results open new questions about the regulatory mechanisms that produce strong RA relationships. RA is scalable and potentially well suited for the analysis of thousands of bulk-RNA or single-cell transcriptomes.

【 授权许可】

   
2014 Abelin et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150113170226116.pdf 1776KB PDF download
Figure 9. 84KB Image download
Figure 8. 46KB Image download
Figure 7. 127KB Image download
Figure 6. 142KB Image download
Figure 5. 58KB Image download
Figure 4. 139KB Image download
Figure 3. 111KB Image download
Figure 2. 62KB Image download
Figure 1. 126KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

【 参考文献 】
  • [1]Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998, 95(25):14863-14868.
  • [2]Priness I, Maimon O, Ben-Gal I: Evaluation of gene-expression clustering via mutual information distance measure. BMC Bioinformatics 2007, 8:111. BioMed Central Full Text
  • [3]Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC: Detecting novel associations in large data sets. Science 2011, 334(6062):1518-1524.
  • [4]Slonim DK: From patterns to pathways: gene expression data analysis comes of age. Nat Genet 2002, 32(Suppl):502-508.
  • [5]Song L, Langfelder P, Horvath S: Comparison of co-expression measures: mutual information, correlation, and model based indices. BMC Bioinformatics 2012, 13:328. BioMed Central Full Text
  • [6]Kinney JB, Atwal GS: Equitability, mutual information, and the maximal information coefficient. Proc Natl Acad Sci U S A 2014, 111(9):3354-3359.
  • [7]Bland JM, Altman DG: Correlation in restricted ranges of data. BMJ 2011, 342:d556.
  • [8]Genomes Project C, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA: An integrated map of genetic variation from 1,092 human genomes. Nature 2012, 491(7422):56-65.
  • [9]Regier M, Hamdan MA: Correlation in a bivariate normal distribution with truncation in both variables. Aust J Stat 1971, 13:77-82.
  • [10]Roberts A, Pachter L: Streaming fragment assignment for real-time analysis of sequencing experiments. Nat Methods 2013, 10(1):71-73.
  • [11]Marinov GK, Williams BA, McCue K, Schroth GP, Gertz J, Myers RM, Wold BJ: From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing. Genome Res 2014, 24(3):496-510.
  • [12]Schnute J: A new approach to bivariate trend lines. J Am Stat Assoc 1984, 79(384):1-8.
  • [13]da Huang W, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009, 4(1):44-57.
  • [14]da Huang W, Sherman BT, Lempicki RA: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 2009, 37(1):1-13.
  • [15]Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M: The KEGG resource for deciphering the genome. Nucleic Acids Res 2004, 32(Database issue):D277-D280.
  • [16]Zhou H, Jin J, Zhang H, Yi B, Wozniak M, Wong L: IntPath--an integrated pathway gene relationship database for model organisms and important pathogens. BMC Syst Biol 2012, 6 Suppl 2:S2.
  • [17]Das S, Anczukow O, Akerman M, Krainer AR: Oncogenic splicing factor SRSF1 is a critical transcriptional target of MYC. Cell Rep 2012, 1(2):110-117.
  • [18]Gout S, Brambilla E, Boudria A, Drissi R, Lantuejoul S, Gazzeri S, Eymin B: Abnormal expression of the pre-mRNA splicing regulators SRSF1, SRSF2, SRPK1 and SRPK2 in non small cell lung carcinoma. PLoS One 2012, 7(10):e46539.
  • [19]Stickeler E, Kittrell F, Medina D, Berget SM: Stage-specific changes in SR splicing factors and alternative splicing in mammary tumorigenesis. Oncogene 1999, 18(24):3574-3582.
  • [20]Smith CW, Valcarcel J: Alternative pre-mRNA splicing: the logic of combinatorial control. Trends Biochem Sci 2000, 25(8):381-388.
  • [21]Long JC, Caceres JF: The SR protein family of splicing factors: master regulators of gene expression. Biochem J 2009, 417(1):15-27.
  • [22]Expert-Bezancon A, Le Caer JP, Marie J: Heterogeneous nuclear ribonucleoprotein (hnRNP) K is a component of an intronic splicing enhancer complex that activates the splicing of the alternative exon 6A from chicken beta-tropomyosin pre-mRNA. J Biol Chem 2002, 277(19):16614-16623.
  • [23]Venables JP, Bourgeois CF, Dalgliesh C, Kister L, Stevenin J, Elliott DJ: Up-regulation of the ubiquitous alternative splicing factor Tra2beta causes inclusion of a germ cell-specific exon. Hum Mol Genet 2005, 14(16):2289-2303.
  • [24]Nasim MT, Chernova TK, Chowdhury HM, Yue BG, Eperon IC: HnRNP G and Tra2beta: opposite effects on splicing matched by antagonism in RNA binding. Hum Mol Genet 2003, 12(11):1337-1348.
  • [25]Moursy A, Allain FH, Clery A: Characterization of the RNA recognition mode of hnRNP G extends its role in SMN2 splicing regulation. Nucleic Acids Res 2014, 42(10):6659-6672.
  • [26]Mueller WF, Hertel KJ: The role of SR and SR-related proteins in pre-mRNA splicing. In In RNA Binding Proteins. Edited by Lorkovic ZJ. Austin, Texas: Landes Bioscience; 2012:27-46.
  • [27]Martinez-Arribas F, Agudo D, Pollan M, Gomez-Esquer F, Diaz-Gil G, Lucas R, Schneider J: Positive correlation between the expression of X-chromosome RBM genes (RBMX, RBM3, RBM10) and the proapoptotic Bax gene in human breast cancer. J Cell Biochem 2006, 97(6):1275-1282.
  • [28]Lee SW, Lee MH, Park JH, Kang SH, Yoo HM, Ka SH, Oh YM, Jeon YJ, Chung CH: SUMOylation of hnRNP-K is required for p53-mediated cell-cycle arrest in response to DNA damage. EMBO J 2012, 31(23):4441-4452.
  • [29]Pruitt KD, Tatusova T, Klimke W, Maglott DR: NCBI reference sequences: current status, policy and new initiatives. Nucleic Acids Res 2009, 37(Database issue):D32-D36.
  • [30]Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009, 10(3):R25. BioMed Central Full Text
  • [31]Fieller EC: Some problems in intercal estimation. J Royal Stat Soc Series B 16 1954, 2:175-185.
  • [32]Marsaglia G: Ratios of normal variables. J Stat Software 2006, 16(4):1-10.
  • [33]Reed GF, Lynn F, Meade BD: Use of coefficient of variation in assessing variability of quantitative assays. Clin Diagn Lab Immunol 2002, 9(6):1235-1239.
  • [34]Isobe T, Feigelson ED, Akritas MG, Babu GJ: Linear regression in astronomy. Astrophys J Part 1 1990, 364:104-113.
  • [35]Feigelson ED, Babu GJ: Linear regression in astronomy. II. Astrophys J Part 1 1992, 397:55-67.
  文献评价指标  
  下载次数:133次 浏览次数:26次