期刊论文

【摘要】

DNA repeats have great importance for biological research and a large number of tools for determining repeats have been developed. Herein we define a method for extracting a statistically significant subset of a determined set of repeats. Our aim was to identify a subset of repeats in the input sequences that are not expected to occur with a number of their appearances in a random sequence of the same length. It is expected that results obtained in such manner would reduce the quantity of processed material and could thereby represent a more important biological signal. With DNA, RNA, and protein sequences serving as input material, we also examined the possibility of statistical filtering of repeats in sequences over an arbitrary alphabet. A new method for selecting statistically significant repeats from a set of determined repeats has been defined. The proposed method was tested on a large number of randomly generated sequences. The application of the method on biological sequences revealed that for some viruses, shorter repeats are more statistically significant than longer ones because of their frequent appearance, whereas for bacteria, the majority of identified repeats are statistically significant.

【授权许可】

Unknown

【预览】

附件列表
Files	Size	Format	View
RO201910254475509ZK.pdf	471KB	PDF	download

Journal of computational biology: A journal of computational molecular cell biology
Finding Statistically Significant Repeats in Nucleic Acids and Proteins

Nenad S.Mitic^2¹ Ana M.Jelovic^1,2² SamiraEshafah^2³
[1] Faculty of Mathematics, University of Belgrade, Belgrade, Serbia^2;Faculty of Transport and Traffic Engineering, University of Belgrade, Belgrade, Serbia^1;Institute of General and Physical Chemistry, Bio-Lab, Belgrade, Serbia^3
关键词: DNA; protein sequences; repeats; RNA; statistically significant;
DOI : 10.1089/cmb.2017.0046
学科分类：生物科学（综合）
来源: Mary Ann Liebert, Inc. Publishers
PDF


	文献评价指标
	下载次数：15次	浏览次数：13次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】