Runtime specialization optimizes programs based on partial information available only at run time. It is applicable when some input data is used repeatedly while other input data varies. This technique has the potential of generating highly efficient codes.In this thesis we explore the potential for obtaining speed-ups for sparse matrix-dense vector multipli- cation using runtime specialization, in the case where a single matrix is to be multiplied by many vectors. We experiment with five methods involving run-time specialization with parallelization, comparing them to methods that do not (including Intel’s MKL library). For this work, our focus is the evaluation of the parallel speed-ups that can be obtained with runtime specialization without considering the overheads of the code generation.Our experiments run on four different machines with 88 matrices from the Matrix Market and Florida collections, among others. In 348 of those 352 cases, the specialized code runs faster than any version without specialization. In the worst case, the specialized code is 7 percent slower than the Intel’s MKL library. If we only use specialization, the average speedup with respect to Intel’s MKL library ranges from 1.416x to 1.470x, depending on the machine. We have also found that the best method depends on the matrix and machine; no method is best for all matrices and machines.
【 预 览 】
附件列表
Files
Size
Format
View
Optimization by runtime specialization for sparse matrix-vector multiplication