期刊论文

【摘要】

BackgroundThere is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing (HPC) environment with a greatly extended data storage capacity.ResultsWe designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory (RAM) and a distributed file-allocation system for distance matrix construction and pair-align computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing algorithm, called "idle node-seeking task algorithm" (INSTA). The new editing option and the graphical user interface (GUI) provide ready access to a parallel-computing environment for users who seek fast and easy alignment of large DNA and protein sequence sets.ConclusionsClustalXeed can now compute a large volume of biological sequence data sets, which were not tractable in any other parallel or single MSA program. The main developments include: 1) the ability to tackle larger sequence alignment problems than possible with previous systems through markedly improved storage-handling capabilities. 2) Implementing an efficient task load-balancing algorithm, INSTA, which improves overall processing times for multiple sequence alignment with input sequences of non-uniform length. 3) Support for both single PC and distributed cluster systems.

【授权许可】

CC BY
© Kim and Joo; licensee BioMed Central Ltd. 2010

【预览】

附件列表
Files	Size	Format	View
RO202311105717626ZK.pdf	3920KB	PDF	download

【参考文献】

[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]

BMC Bioinformatics
ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment
Software
Hyun Joo¹ Taeho Kim²
[1] Department of Physiology and Integrated Biosystems, College of Medicine, Inje University, 614-735, Busan, South Korea;Laboratory of Systems Immunology, World Premier International Immunology Frontier Research Center, Osaka University, 565-0871, Suita, Osaka, Japan;
关键词: Multiple Sequence Alignment; Message Passing Interface; Random Access Memory; Slave Node; Computation Node;
DOI : 10.1186/1471-2105-11-467
received in 2010-05-06, accepted in 2010-09-17, 发布年份 2010
来源: Springer
PDF


	文献评价指标
	下载次数：9次	浏览次数：3次

【 摘 要 】

【 授权许可】

【 预 览 】

【 参考文献 】

【摘要】

【授权许可】

【预览】

【参考文献】