Limited-range functions are domain-level optimizations to a class of applications where all input elements contribute to all output elements, based on the distance between two given elements. When the contribution of an input element to the output is inversely proportional to the distance, a limited range can be applied, which approximates the contribution to zero beyond a certain cutoff distance. Introducing a limited-range function to the application reduces the computation complexity from O(N2) to O(N).Processing multiple input elements in a limited-range function in parallel can lead to data races without the use of expensive synchronization. That is why a preferred approach is an output-driven one, where multiple output elements are processed in parallel, all sharing the input data set for reads. Typically the input data set is unstructured, which without the use of binning, would result in every output element in the output-driven approach reading all of the input elements to determine which ones fall within its cutoff. Binning is a preconditioning step that sorts the input elements into predetermined bins that are easily accessible by the output, thus allowing the output to only access the bins relevant to its computation.Traditionally, bins were created with uniform size and capacity to enable easy access to them; however, making the bins regular can have severe side-effects on memory requirements to maintain these bins. We propose a technique to allow the bins to vary in capacity in order to reduce the memory overhead, at the cost of added accessing overhead. In this work, we will compare regular binning and our approach, compact binning. We will demonstrate that compact bins can in fact improve the execution performance of limited-range functions executed in parallel.
【 预 览 】
附件列表
Files
Size
Format
View
Compact binning for parallel processing of limited-range functions