期刊论文详细信息
IEEE Access
Two-Stage Column Block Parallel LU Factorization Algorithm
Rongteng Wu1  Xiaohong Xie2 
[1] College of Computer and Control Engineering, Minjiang University, Fuzhou, China;Department of Teaching Affairs, Minjiang University, Fuzhou, China;
关键词: LU factorization;    load balancing;    nonblocking communication;    parallel execution time;    scalability;   
DOI  :  10.1109/ACCESS.2019.2962355
来源: DOAJ
【 摘 要 】

Parallel computing is increasingly important in computer architectures, parallel architecture has become ubiquitous in our everyday lives. Novel architectures and programming models pose new challenges to algorithm design and system software development. This paper presents a two-stage column block parallel LU factorization algorithm for multiple-processor architectures. Any given matrix is first partitioned into large blocks, and then, every large block is partitioned into a number of small blocks according to the number of processors. Finally, the small column blocks are allocated to processors in an orderly “serpentine arrangement.” Each iteration of the column block parallel LU factorization is separated into two stages of operation. In the first stage, the first-step factorization operation is processed in advance and nonblocking communication is used to reduce the processor idle and waiting time and improve parallelism. In the second stage, the large blocks are used to satisfy more powerful processors, such as GPUs, which require more data to exploit their computing capabilities. Experiments are conducted on a multicore system and multi-GPU system with different configurations to test the algorithm's performance. Compared with other related column block parallel LU factorizations, the two-stage algorithm exhibits better load balancing and parallel execution time performance.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:1次