Journal of computational biology: A journal of computational molecular cell biology | |
Massively Parallel Implementation of Sequence Alignment with Basic Local Alignment Search Tool Using Parallel Computing in Java Library | |
MarekNowicki^1,21  PiotrBaŁa^42  DavitBzhalava^33  | |
[1] Address correspondence to:Dr. Marek NowickiFaculty of Mathematics and Computer ScienceNicolaus Copernicus UniversityChopina 12/1887-100 ToruńPoland^1;Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden^3;Faculty of Mathematics and Computer Science, Nicolaus Copernicus University in Toruń, Poland^2;Interdisciplinary Center for Mathematical and Computational Modeling, University of Warsaw, Warsaw, Poland^4 | |
关键词: BLAST; Java; next-generation sequencing; PCJ; sequence alignment; | |
DOI : 10.1089/cmb.2018.0079 | |
学科分类:生物科学(综合) | |
来源: Mary Ann Liebert, Inc. Publishers | |
【 摘 要 】
Basic Local Alignment Search Tool (BLAST) is an essential algorithm that researchers use for sequence alignment analysis. The National Center for Biotechnology Information (NCBI)-BLAST application is the most popular implementation of the BLAST algorithm. It can run on a single multithreading node. However, the volume of nucleotide and protein data is fast growing, making single node insufficient. It is more and more important to develop high-performance computing solutions, which could help researchers to analyze genetic data in a fast and scalable way. This article presents execution of the BLAST algorithm onhigh performance computing (HPC) clusters and supercomputers in a massively parallel manner using thousands of processors. The Parallel Computing in Java (PCJ) library has been used to implement the optimal splitting up of the input queries, the work distribution, and search management. It is used with the nonmodified NCBI-BLAST package, which is an additional advantage for the users. The result application—PCJ-BLAST—is responsible for reading sequence for comparison, splitting it up and starting multiple NCBI-BLAST executables. Since I/O performance could limit sequence analysis performance, the article contains an investigation of this problem. The obtained results show that using Java and PCJ library it is possible to perform sequence analysis using hundreds of nodes in parallel. We have achieved excellent performance and efficiency and we have significantly reduced the time required for sequence analysis. Our work also proved that PCJ library could be used as an effective tool for fast development of the scalable applications.
【 授权许可】
Unknown
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO201910251059252ZK.pdf | 2205KB | download |