期刊论文

【摘要】

The ubiquity of next-generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2652 human RNA-seq experiments uploaded to the Sequence Read Archive (SRA). Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this article, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39%–85%, with a price of upto 3 × memory consumption during queries. Notably, it can query a batch of 198,074 queries in <8 hours (compared with around 2 days previously) and a whole set of k-mers from a sequencing experiment (about 27 million k-mers) in <11 minutes.

【授权许可】

Unknown

【预览】

附件列表
Files	Size	Format	View
RO201910256880678ZK.pdf	435KB	PDF	download

Journal of computational biology: A journal of computational molecular cell biology
AllSome Sequence Bloom Trees

RayanChikhi^3¹ PaulMedvedev^1,4,5² Robert S.Harris^2³ ChenSun^1⁴
[1] CNRS, CRIStAL, University of Lille, Lille, France^3;Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania^4;Department of Biology, Pennsylvania State University, University Park, Pennsylvania^2;Department of Computer Science and Engineering, Pennsylvania State University, University Park, Pennsylvania^1;Genome Sciences Institute of the Huck, Pennsylvania State University, University Park, Pennsylvania^5
关键词: Sequence Bloom Trees; Bloom filters; RNA-seq; data structures; algorithms; bioinformatics;
DOI : 10.1089/cmb.2017.0258
学科分类：生物科学（综合）
来源: Mary Ann Liebert, Inc. Publishers
PDF


	文献评价指标
	下载次数：22次	浏览次数：18次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】