期刊论文

【摘要】

BackgroundOne of the challenges of bioinformatics remains the recognition of short signal sequences in genomic DNA such as donor or acceptor splice sites, splicing enhancers or silencers, translation initiation sites, transcription start sites, transcription factor binding sites, nucleosome binding sites, miRNA binding sites, or insulator binding sites. During the last decade, a wealth of algorithms for the recognition of such DNA sequences has been developed and compared with the goal of improving their performance and to deepen our understanding of the underlying cellular processes. Most of these algorithms are based on statistical models belonging to the family of Markov random fields such as position weight matrix models, weight array matrix models, Markov models of higher order, or moral Bayesian networks. While in many comparative studies different learning principles or different statistical models have been compared, the influence of choosing different prior distributions for the model parameters when using different learning principles has been overlooked, and possibly lead to questionable conclusions.ResultsWith the goal of allowing direct comparisons of different learning principles for models from the family of Markov random fields based on the same a-priori information, we derive a generalization of the commonly-used product-Dirichlet prior. We find that the derived prior behaves like a Gaussian prior close to the maximum and like a Laplace prior in the far tails. In two case studies, we illustrate the utility of the derived prior for a direct comparison of different learning principles with different models for the recognition of binding sites of the transcription factor Sp1 and human donor splice sites.ConclusionsWe find that comparisons of different learning principles using the same a-priori information can lead to conclusions different from those of previous studies in which the effect resulting from different priors has been neglected. We implement the derived prior in the open-source library Jstacs to enable an easy application to comparative studies of different learning principles in the field of sequence analysis.

【授权许可】

CC BY
© Keilwagen et al; licensee BioMed Central Ltd. 2010

【预览】

附件列表
Files	Size	Format	View
RO202311104083306ZK.pdf	1236KB	PDF	download

【参考文献】

[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]

BMC Bioinformatics
Apples and oranges: avoiding different priors in Bayesian DNA sequence analysis
Research Article
Jan Grau¹ Stefan Posch¹ Jens Keilwagen² Ivo Grosse³
[1] Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle/Saale, Germany;Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany;Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany;Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle/Saale, Germany;
关键词: Positive Predictive Value; Bayesian Network; Markov Random Field; Donor Splice Site; Position Weight Matrix;
DOI : 10.1186/1471-2105-11-149
received in 2009-08-21, accepted in 2010-03-22, 发布年份 2010
来源: Springer
PDF


	文献评价指标
	下载次数：4次	浏览次数：0次

【 摘 要 】

【 授权许可】

【 预 览 】

【 参考文献 】

【摘要】

【授权许可】

【预览】

【参考文献】