科技报告详细信息
Fast sparse matrix-vector multiplication by exploiting variable block structure
Vuduc, R W ; Moon, H
Lawrence Livermore National Laboratory
关键词: Storage;    Tuning;    99 General And Miscellaneous//Mathematics, Computing, And Information Science;    Implementation;    Simulation;   
DOI  :  10.2172/891708
RP-ID  :  UCRL-TR-213454
RP-ID  :  W-7405-ENG-48
RP-ID  :  891708
美国|英语
来源: UNT Digital Library
PDF
【 摘 要 】

We improve the performance of sparse matrix-vector multiply (SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this kind of structure. Our technique splits the matrix, A, into a sum, A{sub 1} + A{sub 2} + ... + A{sub s}, where each term is stored in a new data structure, unaligned block compressed sparse row (UBCSR) format . The classical alternative approach of storing A in a block compressed sparse row (BCSR) format yields limited performance gains because it imposes a particular alignment of the matrix non-zero structure, leading to extra work from explicitly padded zeros. Combining splitting and UBCSR reduces this extra work while retaining the generally lower memory bandwidth requirements and register-level tiling opportunities of BCSR. Using application test matrices, we show empirically that speedups can be as high as 2.1x over not blocking at all, and as high as 1.8x over the standard BCSR implementation used in prior work. When performance does not improve, split UBCSR can still significantly reduce matrix storage. Through extensive experiments, we further show that the empirically optimal number of splittings s and the block size for each matrix term A{sub i} will in practice depend on the matrix and hardware platform. Our data lay a foundation for future development of fully automated methods for tuning these parameters.

【 预 览 】
附件列表
Files Size Format View
891708.pdf 1053KB PDF download
  文献评价指标  
  下载次数:14次 浏览次数:14次