期刊论文

【摘要】

Transcription factor (TF) binding to DNA can be modeled in a number of different ways. It is highly debated which modeling methods are the best, how the models should be built and what can they be applied to. In this study a linear k-mer model proposed for predicting TF specificity in protein binding microarrays (PBM) is applied to a high-throughput SELEX data and the question of how to choose the most informative k-mers to the binding model is studied. We implemented the standard cross-validation scheme to reduce the number of k-mers in the model and observed that the number of k-mers can often be reduced significantly without a great negative effect on prediction accuracy. We also found that the later SELEX enrichment cycles provide a much better discrimination between bound and unbound sequences as model prediction accuracies increased for all proteins together with the cycle number. We compared prediction performance of k-mer and position specific weight matrix (PWM) models derived from the same SELEX data. Consistent with previous results on PBM data, performance of the k-mer model was on average 9%-units better. For the 15 proteins in the SELEX data set with medium enrichment cycles, classification accuracies were on average 71% and 62% for k-mer and PWMs, respectively. Finally, the k-mer model trained with SELEX data was evaluated on ChIP-seq data demonstrating substantial improvements for some proteins. For protein GATA1 the model can distinquish between true ChIP-seq peaks and negative peaks. For proteins RFX3 and NFATC1 the performance of the model was no better than chance.

【授权许可】

CC BY
© Kähärä and Lähdesmäki; licensee BioMed Central Ltd. 2013

【预览】

附件列表
Files	Size	Format	View
RO202311107755392ZK.pdf	2224KB	PDF	download

【参考文献】

[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]

BMC Bioinformatics
Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data
Research
Juhani Kähärä¹ Harri Lähdesmäki²
[1] Department of Information and Computer Science, Aalto University School of Science, FI-00076, Aalto, Finland;Department of Information and Computer Science, Aalto University School of Science, FI-00076, Aalto, Finland;Turku Centre for Biotechnology, Turku University, Turku, Finland;
关键词: Feature Selection; Classification Accuracy; Average Classification Accuracy; Feature Selection Strategy; Affinity Score;
DOI : 10.1186/1471-2105-14-S10-S2
来源: Springer
PDF


	文献评价指标
	下载次数：1次	浏览次数：0次

【 摘 要 】

【 授权许可】

【 预 览 】

【 参考文献 】

【摘要】

【授权许可】

【预览】

【参考文献】