科技报告

【摘要】

We improve the performance of sparse matrix-vector multiply (SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this kind of structure. Our technique splits the matrix, A, into a sum, A{sub 1} + A{sub 2} + ... + A{sub s}, where each term is stored in a new data structure, unaligned block compressed sparse row (UBCSR) format . The classical alternative approach of storing A in a block compressed sparse row (BCSR) format yields limited performance gains because it imposes a particular alignment of the matrix non-zero structure, leading to extra work from explicitly padded zeros. Combining splitting and UBCSR reduces this extra work while retaining the generally lower memory bandwidth requirements and register-level tiling opportunities of BCSR. Using application test matrices, we show empirically that speedups can be as high as 2.1x over not blocking at all, and as high as 1.8x over the standard BCSR implementation used in prior work. When performance does not improve, split UBCSR can still significantly reduce matrix storage. Through extensive experiments, we further show that the empirically optimal number of splittings s and the block size for each matrix term A{sub i} will in practice depend on the matrix and hardware platform. Our data lay a foundation for future development of fully automated methods for tuning these parameters.

【预览】

附件列表
Files	Size	Format	View
891708.pdf	1053KB	PDF	download


Fast sparse matrix-vector multiplication by exploiting variable block structure

Vuduc, R W ; Moon, H
Lawrence Livermore National Laboratory
关键词: Storage; Tuning; 99 General And Miscellaneous//Mathematics, Computing, And Information Science; Implementation; Simulation;
DOI : 10.2172/891708 RP-ID : UCRL-TR-213454 RP-ID : W-7405-ENG-48 RP-ID : 891708
美国\|英语
来源: UNT Digital Library
PDF


	文献评价指标
	下载次数：30次	浏览次数：14次

【 摘 要 】

【 预 览 】

【摘要】

【预览】