BMC Bioinformatics | |
Distribution Analyzer, a methodology for identifying and clustering outlier conditions from single-cell distributions, and its application to a Nanog reporter RNAi screen | |
Julian A. Gingold5  Ed S. Coakley4  Jie Su3  Dung-Fang Lee5  Zerlina Lau2  Hongwei Zhou5  Dan P. Felsenfeld2  Christoph Schaniel1  Ihor R. Lemischka1  | |
[1] Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, New York 10029, NY, USA | |
[2] Integrated Screening Core, Experimental Therapeutics Institute, Icahn School of Medicine at Mount Sinai, New York 10029, NY, USA | |
[3] Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York 10065, NY, USA | |
[4] Program in Applied Mathematics, Yale University, New Haven 06511, CT, USA | |
[5] Department of Developmental and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York 10029, NY, USA | |
关键词: Kolmogorov-Smirnov distance; Hellinger distance; Nanog RNAi screen; High-content screening methodology; Fluorescence distribution; Genome-scale screen analysis; | |
Others : 1230724 DOI : 10.1186/s12859-015-0636-7 |
|
received in 2015-01-16, accepted in 2015-06-05, 发布年份 2015 |
【 摘 要 】
Background
Chemical or small interfering (si) RNA screens measure the effects of many independent experimental conditions, each applied to a population of cells (e.g., all of the cells in a well). High-content screens permit a readout (e.g., fluorescence, luminescence, cell morphology) from each cell in the population. Most analysis approaches compare the average effect on each population, precluding identification of outliers that affect the distribution of the reporter in the population but not its average. Other approaches only measure changes to the distribution with a single parameter, precluding accurate distinction and clustering of interesting outlier distributions.
Results
We describe a methodology to identify outlier conditions by considering the cell-level measurements from each condition as a sample of an underlying distribution. With appropriate selection of a distance metric, all effects can be embedded in a fixed-dimensionality Euclidean basis, facilitating identification and clustering of biologically interesting outliers. We demonstrate that measurement of distances with the Hellinger distance metric offers substantial computational efficiencies over alternative metrics. We validate this methodology using an RNA interference (RNAi) screen in mouse embryonic stem cells (ESC) with a Nanog reporter. The methodology clusters effects of multiple control siRNAs into their true identities better than conventional approaches describing the median cell fluorescence or the commonly used Kolmogorov-Smirnov distance between the observed fluorescence distribution and the null distribution. It identifies outlier genes with effects on the reporter distribution that would have been missed by other methods. Among them, siRNA targeting Chek1 leads to a wider Nanog reporter fluorescence distribution. Similarly, siRNA targeting Med14 or Med27 leads to a narrower Nanog reporter fluorescence distribution. We confirm the roles of these three genes in regulating pluripotency by mRNA expression and alkaline phosphatase staining using independent short hairpin (sh) RNAs.
Conclusions
Using our methodology, we describe each experimental condition by a probability distribution. Measuring distances between probability distributions permits a multivariate rather than univariate readout. Clustering points derived from these distances allows us to obtain greater biological insight than methods based solely on single parameters. We find several outliers from a mouse ESC RNAi screen that we confirm to be pluripotency regulators. Many of these outliers would have been missed by other analysis methods.
【 授权许可】
2015 Gingold et al.
Files | Size | Format | View |
---|---|---|---|
Fig. 6. | 97KB | Image | download |
Fig. 5. | 167KB | Image | download |
Fig. 4. | 108KB | Image | download |
Fig. 3. | 103KB | Image | download |
Fig. 2. | 87KB | Image | download |
Fig. 1. | 112KB | Image | download |
Fig. 6. | 97KB | Image | download |
Fig. 5. | 167KB | Image | download |
Fig. 4. | 108KB | Image | download |
Fig. 3. | 103KB | Image | download |
Fig. 2. | 87KB | Image | download |
Fig. 1. | 112KB | Image | download |
【 图 表 】
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
【 参考文献 】
- [1]Abraham VC, Taylor DL, Haskins JR. High content screening applied to large-scale cell biology. Trends Biotechnol. 2004; 22:15-22.
- [2]Sigoillot FD, King RW. Vigilance and Validation: Keys to Success in RNAi Screening. ACS Chem Biol. 2011;6(1):47-60. doi:10.1021/cb100358f. Epub 2010 Dec 28.
- [3]Haney SA: High Content Screening: Science, Techniques and Applications. Hoboken, New Jersey: John Wiley & Sons; 2008
- [4]Schaniel C, Lee D-F, Gonsalves FC, DasGupta R, Lemischka IR. Exploration of self-renewal and pluripotency in ES cells using RNAi. Methods Enzymol. 2010; 477(null):351-65.
- [5]Lee D-F, Su J, Sevilla A, Gingold J, Schaniel C, Lemischka IR. Combining competition assays with genetic complementation strategies to dissect mouse embryonic stem cell self-renewal and pluripotency. Nat Protoc. 2012; 7:729-748.
- [6]Chia NY, Chan YS, Feng B, Lu X, Orlov YL, Moreau D, Kumar P, Yang L, Jiang J, Lau MS, Huss M, Soh BS, Kraus P, Li P, Lufkin T, Lim B, Clarke ND, Bard F, Ng HH. A genome-wide RNAi screen reveals determinants of human embryonic stem cell identity. Nature. 2010; 468:316-320.
- [7]Hu G, Kim J, Xu Q, Leng Y, Orkin SH. A genome-wide RNAi screen identifies a new transcriptional module required for self-renewal. Genes Dev. 2009; 23:837-48.
- [8]Ding L, Paszkowski-Rogacz M, Nitzsche A, Slabicki MM, Heninger A-K, de Vries I, Kittler R, Junqueira M, Shevchenko A, Schulz H, Hubner N, Doss MX, Sachinidis A, Hescheler J, Iacone R, Anastassiadis K, Stewart AF, Pisabarro MT, Caldarelli A, Poser I, Theis M, Buchholz F. A genome-scale RNAi screen for Oct4 modulators defines a role of the Paf1 complex for embryonic stem cell identity. Cell Stem Cell. 2009; 4:403-15.
- [9]Fazzio TG, Huff JT, Panning B. An RNAi screen of chromatin proteins identifies Tip60-p400 as a regulator of embryonic stem cell identity. Cell. 2008; 134:162-74.
- [10]Betschinger J, Nichols J, Dietmann S, Corrin PD, Paddison PJ, Smith A. Exit from pluripotency is gated by intracellular redistribution of the bHLH transcription factor Tfe3. Cell. 2013; 153:335-47.
- [11]Schaniel C, Ang YS, Ratnakumar K, Cormier C, James T, Bernstein E, Lemischka IR, Paddison PJ. Smarcc1/Baf155 couples self-renewal gene repression with changes in chromatin structure in mouse embryonic stem cells. Stem Cells. 2009; 27:2979-2989.
- [12]Gingold JA, Fidalgo M, Guallar D, Lau Z, Sun Z, Zhou H, Faiola F, Huang X, Lee D-F, Waghray A, Schaniel C, Felsenfeld DP, Lemischka IR, Wang J. A Genome-wide RNAi Screen Identifies Opposing Functions of Snai1 and Snai2 on the Nanog Dependency in Reprogramming. Mol Cell. 2014 Oct 2014;56(1):140-52. doi:10.1016/j.molcel.2014.08.014. Epub 2014 Sep 15.
- [13]Buckley SM, Aranda-Orgilles B, Strikoudis A, Apostolou E, Loizou E, Moran-Crusio K, Farnsworth CL, Koller AA, Dasgupta R, Silva JC, Stadtfeld M, Hochedlinger K, Chen EI, Aifantis I. Regulation of pluripotency and cellular reprogramming by the ubiquitin-proteasome system. Cell Stem Cell. 2012; 11:783-98.
- [14]Yang S-H, Kalkan T, Morrisroe C, Smith A, Sharrocks AD. A genome-wide RNAi screen reveals MAP kinase phosphatases as key ERK pathway regulators during embryonic stem cell differentiation. PLoS Genet. 2012; 8: Article ID e1003112
- [15]Leeb M, Dietmann S, Paramor M, Niwa H, Smith A. Genetic exploration of the exit from self-renewal using haploid embryonic stem cells. Cell Stem Cell. 2014; 14:385-93.
- [16]MacArthur BD, Sevilla A, Lenz M, Müller F-J, Schuldt BM, Schuppert AA, Ridden SJ, Stumpf PS, Fidalgo M, Ma’ayan A, Wang J, Lemischka IR. Nanog-dependent feedback loops regulate murine embryonic stem cell heterogeneity. Nat Cell Biol. 2012; 14:1139-47.
- [17]Martinez Arias A, Brickman JM. Gene expression heterogeneities in embryonic stem cell populations: origin and function. Curr Opin Cell Biol. 2011; 23:650-6.
- [18]Kalmar T, Lim C, Hayward P, Muñoz-Descalzo S, Nichols J, Garcia-Ojalvo J, Martinez Arias A. Regulated fluctuations in nanog expression mediate cell fate decisions in embryonic stem cells. PLoS Biol. 2009; 7:e1000149.
- [19]Hayashi K, de Sousa Lopes SMC, Tang F, Surani MA. Dynamic equilibrium and heterogeneity of mouse pluripotent stem cells with distinct functional and epigenetic states. Cell Stem Cell. 2008; 3:391-401.
- [20]Macarthur BD, Ma’ayan A, Lemischka IR. Systems biology of stem cell fate and cellular reprogramming. Nat Rev Mol Cell Biol. 2009; 10:672-81.
- [21]Kumar RM, Cahan P, Shalek AK, Satija R, Jay DaleyKeyser A, Li H, Zhang J, Pardee K, Gennert D, Trombetta JJ, Ferrante TC, Regev A, Daley GQ, Collins JJ. Deconstructing transcriptional heterogeneity in pluripotent stem cells. Nature. 2014; 516:56-61.
- [22]Westerman BA, Braat AK, Taub N, Potman M, Vissers JHA, Blom M, Verhoeven E, Stoop H, Gillis A, Velds A, Nijkamp W, Beijersbergen R, Huber LA, Looijenga LHJ, van Lohuizen M. A genome-wide RNAi screen in mouse embryonic stem cells identifies Mp1 as a key mediator of differentiation. J Exp Med. 2011; 208:2675-89.
- [23]Ivanova N, Dobrin R, Lu R, Kotenko I, Levorse J, DeCoste C, Schafer X, Lun Y, Lemischka IR. Dissecting self-renewal in stem cells with RNA interference. Nature. 2006; 442:533-538.
- [24]Perlman ZE, Slack MD, Feng Y, Mitchison TJ, Wu LF, Altschuler SJ. Multidimensional drug profiling by automated microscopy. Science. 2004; 306:1194-8.
- [25]Wilson CJ, Si Y, Thompsons CM, Smellie A, Ashwell MA, Liu J-F, Ye P, Yohannes D, Ng S-C. Identification of a small molecule that induces mitotic arrest using a simplified high-content screening assay and data analysis method. J Biomol Screen. 2006; 11:21-8.
- [26]Gorenstein J, Zack B, Marszalek JR, Bagchi A, Subramaniam S, Carroll P, Elbi C. Reducing the multidimensionality of high-content screening into versatile powerful descriptors. Biotechniques. 2010; 49:663-5.
- [27]McKenna BK, Evans JG, Cheung MC, Ehrlich DJ. A parallel microfluidic flow cytometer for high-content screening. Nat Methods. 2011; 8:401-3.
- [28]Zhang XD. A pair of new statistical parameters for quality control in RNA interference high-throughput screening assays. Genomics. 2007; 89:552-61.
- [29]Mohr SE, Perrimon N. RNAi screening: new approaches, understandings, and organisms. Wiley Interdiscip Rev RNA. 2012; 3:145-58.
- [30]Hutz JE, Nelson T, Wu H, McAllister G, Moutsatsos I, Jaeger SA, Bandyopadhyay S, Nigsch F, Cornett B, Jenkins JL, Selinger DW. The Multidimensional Perturbation Value: A Single Metric to Measure Similarity and Activity of Treatments in High-Throughput Multidimensional Screens. J Biomol Screen. 2013;18(4):367-77. doi:10.1177/1087057112469257. Epub 2012 Nov 29.
- [31]Boutros M, Brás LP, Huber W. Analysis of cell-based RNAi screens. Genome Biol. 2006; 7:R66. BioMed Central Full Text
- [32]Zhang XD. A method for effectively comparing gene effects in multiple conditions in RNAi and expression-profiling research. 2009.
- [33]Silverman BW: Density Estimation for Statistics and Data Analysis. London, England: Chapman and Hall/CRC; 1986.
- [34]Rudin W. Functional Analysis. 2nd Ed. New York: McGraw-Hill Science/Engineering/Math; 1991.
- [35]Stephens MA. EDF Statistics for Goodness of Fit and Some Comparisons. J Am Stat Assoc. 1974; 69:730-737.
- [36]Stephens MA. Tests based on EDF statistics. In: Goodness-of-fit-techniques. 1986.97-193.
- [37]Razali NM, Wah YB. Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. J Stat Model Anal. 2011; 2:21-33.
- [38]Gibbs AL, Su FE. On Choosing and Bounding Probability Metrics. Int Stat Rev. 2002; 70:419-435.
- [39]Deza MM, Deza E. Chapter 14. In Encyclopedia of Distances. 2nd edition. Heidelberg, Germany: Springer; 2012. 590.
- [40]Croarkin C, Tobias P. NIST/SEMANTECH e-Handbook of Statistical Methods. Retrieved January 2012:1.3.5.14–16. 7.2.1. http://www. itl.nist.gov/div898/handbook/ webcite
- [41]DasGupta A: Asymptotic theory of statistics and probability. 1st edition. Edited by Dasgupta A. New York, NY: Springer; 2008. 2.1.
- [42]Pollard D. A User’s Guide to Measure Theoretic Probability. New York, NY: Cambridge University Press; 2001.
- [43]Amari S-I, Nagaoka H: Methods of Information Geometry (Translations of Mathematical Monographs). Amer Mathematical Society; 2001. http://www. amazon.com/Information-Translations-Mathematical-Monographs-Tanslations/dp/0821843028#reader_0821843028 webcite
- [44]Borg I, Groenen PJF. Modern Multidimensional Scaling: Theory and Applications. 2nd edition. Springer-Verlag New York: Springer; 2005.
- [45]Hope KJ, Cellot S, Ting SB, MacRae T, Mayotte N, Iscove NN, Sauvageau G. An RNAi screen identifies Msi2 and Prox1 as having opposite roles in the regulation of hematopoietic stem cell activity. Cell Stem Cell. 2010; 7:101-13.
- [46]Zuber J, Shi J, Wang E, Rappaport AR, Herrmann H, Sison EA, Magoon D, Qi J, Blatt K, Wunderlich M, Taylor MJ, Johns C, Chicas A, Mulloy JC, Kogan SC, Brown P, Valent P, Bradner JE, Lowe SW, Vakoc CR. RNAi screen identifies Brd4 as a therapeutic target in acute myeloid leukaemia. Nature. 2011; 478:524-8.
- [47]Kagey MH, Newman JJ, Bilodeau S, Zhan Y, Orlando DA, van Berkum NL, Ebmeier CC, Goossens J, Rahl PB, Levine SS, Taatjes DJ, Dekker J, Young RA. Mediator and cohesin connect gene expression and chromatin architecture. Nature. 2010; 467:430-5.
- [48]Rodda DJ, Chew J-L, Lim L-H, Loh Y-H, Wang B, Ng H-H, Robson P. Transcriptional regulation of nanog by OCT4 and SOX2. J Biol Chem. 2005; 280:24731-7.
- [49]Kaufman L, Rousseeuw P. Finding Groups in Data: An Introduction to Cluster Analysis. 1990.
- [50]Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005; 102:15545-50.
- [51]Liu W, Stein P, Cheng X, Yang W, Shao N-Y, Morrisey EE, Schultz RM, You J. BRD4 regulates Nanog expression in mouse embryonic stem cells and preimplantation embryos. Cell Death Differ. 2014; 21:1950-60.
- [52]Horne GA, Stewart HJS, Dickson J, Knapp S, Ramsahoye B, Chevassut T. Nanog Requires BRD4 to Maintain Murine Embryonic Stem Cell Pluripotency and Is Suppressed by Bromodomain Inhibitor JQ1 Together with Lefty1. Stem Cells Dev. 2015;24(7):879-91. doi:10.1089/scd.2014.0302. Epub 2014 Dec 17.
- [53]Di Micco R, Fontanals-Cirera B, Low V, Ntziachristos P, Yuen SK, Lovell CD, Dolgalev I, Yonekubo Y, Zhang G, Rusinova E, Gerona-Navarro G, Cañamero M, Ohlmeyer M, Aifantis I, Zhou M-M, Tsirigos A, Hernando E. Control of embryonic stem cell identity by BRD4-dependent transcriptional elongation of super-enhancer-associated pluripotency genes. Cell Rep. 2014; 9:234-47.
- [54]Lee D-F, Su J, Ang Y-S, Carvajal-Vergara X, Mulero-Navarro S, Pereira CF, Gingold J, Wang H-L, Zhao R, Sevilla A, Darr H, Williamson AJK, Chang B, Niu X, Aguilo F, Flores ER, Sher Y-P, Hung M-C, Whetton AD, Gelb BD, Moore KA, Snoeck H-W, Ma’ayan A, Schaniel C, Lemischka IR. Regulation of embryonic and induced pluripotency by aurora kinase-p53 signaling. Cell Stem Cell. 2012; 11:179-94.
- [55]Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013; 153:307-19.
- [56]Fidalgo M, Faiola F, Pereira C-F, Ding J, Saunders A, Gingold J, Schaniel C, Lemischka IR, Silva JCR, Wang J. Zfp281 mediates Nanog autorepression through recruitment of the NuRD complex and inhibits somatic cell reprogramming. Proc Natl Acad Sci U S A. 2012; 109:16202-16207.
- [57]McLachlan G, Peel D. Finite Mixture Models. New York, NY: John Wiley & Sons. 2000.
- [58]MetaXpress High-Content Image Acquisition and Analysis Software. http://www. moleculardevices.com/systems/high-content-imaging/metaxpress-high-content-image-acquisition-and-analysis-software webcite
- [59]Development Core Team, R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. 2005.
- [60]Moffat J, Grueneberg DA, Yang X, Kim SY, Kloepfer AM, Hinkle G, Piqani B, Eisenhaure TM, Luo B, Grenier JK, Carpenter AE, Foo SY, Stewart SA, Stockwell BR, Hacohen N, Hahn WC, Lander ES, Sabatini DM, Root DE. A lentiviral RNAi library for human and mouse genes applied to an arrayed viral high-content screen. Cell. 2006; 124:1283-98.