| BMC Bioinformatics | |
| A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs | |
| Phillip Seitzer1  Elizabeth G Wilbanks2  David J Larsen1  Marc T Facciotti2  | |
| [1] Genome Center, One Shields Ave, University of California, Davis, CA 95616, USA | |
| [2] Microbiology Graduate Group, One Shields Ave, University of California, Davis, CA, 95616, USA | |
| 关键词: TFB; STAMP; MEME; Comparative genomics; ChIP-chip; ChIP-seq; Monte Carlo; Motif; | |
| Others : 1088053 DOI : 10.1186/1471-2105-13-317 |
|
| received in 2012-02-07, accepted in 2012-11-01, 发布年份 2012 | |
PDF
|
|
【 摘 要 】
Background
Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research.
Results
We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature.
Conclusions
Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We suggest that small differences in our discovered motif could confer specificity for one or more homologous GTF proteins. We offer a free implementation of the MotifCatcher software package athttp://www.bme.ucdavis.edu/facciotti/resources_data/software/ webcite.
【 授权许可】
2012 Seitzer et al.; licensee BioMed Central Ltd.
【 预 览 】
| Files | Size | Format | View |
|---|---|---|---|
| 20150117071359650.pdf | 1672KB | ||
| Figure 1. | 123KB | Image |
【 图 表 】
Figure 1.
【 参考文献 】
- [1]Das MK, Dai HK: A survey of DNA motif finding algorithms. BMC Bioinformatics 2007, 8(Suppl 7):S21. BioMed Central Full Text
- [2]Lawrence CE, Reilly AA: An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 1990, 7:41-51.
- [3]Hertz GZ, Hartzell GW, Stormo GD: Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Computer applications in the biosciences CABIOS 1990, 6:81-92.
- [4]Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC: Detecting subtle sequence signals: a Gibbs sampling strategy for multipe alignment. Science 1993, 262(5131):201-214.
- [5]Bailey TL, Elkan CP: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings International Conference on Intelligent Systems for Molecular Biology ISMB 1994, 2:28-36.
- [6]Helden JV, Andre B: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 1998, 281:827-842.
- [7]Bailey TL, Elkan C: The value of prior knowledge in discovering motifs with MEME. Proceedings International Conference on Intelligent Systems for Molecular Biology ISMB 1995, 3:21-29.
- [8]Tompa M: An exact method for finding short motifs in sequences, with application to the ribosome binding site problem. International Conference on Intelligent Systems for Molecular Biology; ISMB 1999, 7:262-271. International Conference on Intelligent Systems for Molecular Biology
- [9]Liu X: Pacific Symposium on Biocomputing 6:127–138 (2001). Symposium A Quarterly Journal In Modern Foreign Literatures 2001, 138:127-138.
- [10]Thijs G, Marchlal K, Moreau Y: A Gibbs Sampling Method to Detect Overrepresented Motifs in the Upstream Regions of Coexpressed Genes. J Comput Biol 2002, 9:447-464.
- [11]Carlson JM, Chakravarty A, DeZiel CE, Gross RH: SCOPE: a web server for practical de novo motif discovery. Nucleic Acids Res 2007, 35:W259-W264.
- [12]Wang T, Stormo GD: Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 2003, 19:2369-2380.
- [13]Berezikov E, Guryev V, Plasterk RH, Cuppen E: CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting. Genome research 2004, 14:170-178.
- [14]Prakash A, Blanchette M, Sinha S, Tompa M: Motif discovery in heterogeneous sequence data. Pac Symp Biocomput 2004, 359:348-359.
- [15]Sinha S, Blanchette M, Tompa M: PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformatics 2004, 5:170. BioMed Central Full Text
- [16]Moses A, Chiang D, Eisen M: Phylogenetic motif detection by expectation-maximization on evolutionary mixtures. Pacific Symposium on Biocomputing 2004, 359:324-35.
- [17]Wang T, Stormo GD: Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. Proc Natl Acad Sci USA 2005, 102:17400-17405.
- [18]Siddharthan R, Siggia ED, van Nimwegen E, Nimwegen EV: PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny. PLoS Comput Biol 2005, 1:e67.
- [19]Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, MacIsaac KD, Danford TW, Hannett NM, et al.: Transcriptional regulatory code of a eukaryotic genome. Nature 2004, 431:99.
- [20]Hu J, Yang YD, Kihara D: EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences. BMC Bioinformatics 2006, 7:342. BioMed Central Full Text
- [21]Habib N, Kaplan T, Margalit H, Friedman N: A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval. PLoS Comput Biol 2008, 4:e1000010.
- [22]Sandelin A, Wasserman WW: Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. J Mol Biol 2004, 338:207-215.
- [23]Mahony S, Auron PE, Benos PV: DNA familial binding profiles made easy: comparison of various motif alignment and clustering strategies. PLoS Comput Biol 2007, 3:e61.
- [24]Mahony S, Benos PV: STAMP: a web tool for exploring DNA-binding motif similarities. Nucleic Acids Res 2007, 35:W253-W258.
- [25]Piipari M, Down T, Hubbard TJ: Metamotifs--a generative model for building families of nucleotide position weight matrices. BMC Bioinformatics 2010, 11:348. BioMed Central Full Text
- [26]Thomas-Chollier MS, OTuratsinze JV, Janky R, Defrance M, Vervisch E, Brohée S, et al.: RSAT: regulatory sequence analysis tools. Nucleic Acids Res 2008, 36:W119-W127.
- [27]Bailey TL, Williams N, Misleh C, Li WW: MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 2006, 34:W369-W373.
- [28]MacIsaac KD, Wang T, Gordon DB, Gifford DK, Stormo GD, Fraenkel E: An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics 2006, 7:113. BioMed Central Full Text
- [29]Novichkov PS, Rodionov D, Stavrovskaya ED, Novichkova ES, Kazakov AE, Gelfand MS, Arkin AP, et al.: RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach. Nucleic Acids Res 2010, 38:W299-W307.
- [30]Kulakovskiy IV, Boeva V, Favorov aV, Makeev VJ: Deep and wide digging for binding motifs in ChIP-Seq data. Bioinformatics (Oxford, England) 2010, 26:2622-2623.
- [31]Hu M, Yu J, Taylor JMG, Chinnaiyan AM, Qin ZS: On the detection and refinement of transcription factor binding sites using ChIP-Seq data. Nucleic Acids Res 2010, 38:2154-2167.
- [32]Thompson W: Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res 2003, 31:3580-3585.
- [33]Karandikar RL: On the Markov Chain Monte Carlo (MCMC) method. Sadhana 2006, 1:20-104.
- [34]Bailey TL, Gribskov M: Combining evidence using p-values: application to sequence homology searches. Bioinformatics (Oxford, England) 1998, 14:48-54.
- [35]Wade JT, Reppas NB, Church GM, Struhl K: Genomic analysis of LexA binding reveals the permissive nature of the Escherichia coli genome and identifies unconventional target sites. Genes Dev 2005, 19:2619-2630.
- [36]Walker GC: Mutagenesis and inducible responses to deoxyribonucleic acid damage in Escherichia coli. Microbiol Rev 1984, 48:60-93.
- [37]Roth FP, Hughes J, Estep P: Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nature 1998, 16:939-945.
- [38]Nagarajan N, Ng P, Keich U: Refining Motif Finders With E-Value Calculations. RECOMB on Regulatory Genomics 2006, 73.
- [39]Meluh PB, Yang P, Glowczewski L, Koshland D, Smith MM: Cse4p Is a Component of the Core Centromere of Saccharomyces cerevisiae. Cell 1998, 94:607-613.
- [40]Hegemann JH, Fleig UN: The Centromere of Budding Yeast. Bioessays. 1993, 15:451-460.
- [41]Keith KC, Fitzgerald-hayes M: CSE4 Genetically Interacts With the Saccharomyces cerevisiae Centromere DNA Elements CDE I and CDE II but Not CDE III: Implications for the Path of the Centromere DNA Around a Cse4p Variant Nucleosome. Cultures 2000, 156:973-981.
- [42]Lefrançois P, Euskirchen GM, Auerbach RK, Rozowsky J, Gibson T, Yellman CM, Gerstein M, et al.: Efficient yeast ChIP-Seq using multiplex short-read DNA sequencing. BMC Genomics 2009, 10:37. BioMed Central Full Text
- [43]Teichmann S, Babu MM: Gene regulatory network growth by duplication. Nat Genet 2004, 36:492-496.
- [44]Facciotti MT, Reiss DJ, Pan M, Kaur A, Vuthoori M, Bonneau R, Shannon P, et al.: General transcription factor specified global gene regulation in archaea. Proc Natl Acad Sci USA 2007, 104:4630-4635.
- [45]Littlefield O, Korkhin Y, Sigler PB: The structural basis for the oriented assembly of a TBP/TFB/promoter complex. Proc Natl Acad Sci USA 1999, 96:13668-13673.
- [46]Lagrange T, Kapanidis AN, Tang H, Reinberg D, Ebright RH: New core promoter element in RNA polymerase II-dependent transcription: sequence-specific DNA binding by transcription factor IIB. Genes Dev 1998, 12:34-44.
- [47]Wilbanks EG, Larsen DJ, Neches RY, Yao AI, Wu C-Y, Kjolby RS, Facciotti MT: A workflow for genome-wide mapping of archaeal transcription factors with ChIP-seq. Nucleic Acids Res 2012, 40:e74.
- [48]Kharchenko PV, Tolstorukov MY, Park PJ: Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nature Biotechnology 2008, 26:1351-1359.
- [49]Renfrow MB, Naryshkin N, Lewis LM, Chen HT, Ebright RH, Scott R: Transcription factor B contacts promoter DNA near the transcription start site of the archaeal transcription initiation complex. J Biol Chem 2004, 279:2825-2831.
- [50]Hain J, Reiter WD, Hüdepohl U, Zillig W: Elements of an archaeal promoter defined by mutational analysis. Nucleic Acids Res 1992, 20:5423-5428.
- [51]Bell SD, Jackson SP: The role of transcription factor B in transcription initiation and promoter clearance in the archaeon Sulfolobus acidocaldarius. J Biol Chem 2000, 275:12934-12940.
- [52]Nrc H, Res G, Microbiol M, Res DNA, Tbp M, Tbps A, Biol M: MicroCorrespondence. Molecular Microbiology. 2000, 36:1999-2000.
- [53]Torarinsson E, Klenk HP, Garrett R: Divergent transcriptional and translational signals in Archaea. Environ Microbiol 2005, 7:47-54.
- [54]Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res 2004, 14:1188-1190.
PDF