Heterogeneous computing systems with both general purpose multicore central processing units (CPU) and specialized accelerators has emerged recently. Graphics processing unit (GPU) is the most widely used accelerator. To fully utilize such a heterogeneous system’s full computing power, coordination between the two distinct devices, CPU and GPU, is necessary. Previous research has addressed this issue of partitioning the workloads between CPU and GPU from various aspects for regular applications which have high parallelism and little data dependent control flows. However, it is still not clear how irregular applications, which behave differently on different inputs, could be efficiently scheduled on such heterogeneous computing systems. Since CPUs and GPUs have different characteristics, task chunks of these irregular applications show preference, or affinity, to a particular device in heterogeneous computing systems. In this work, we show that by using the method of allocating workloads at task chunk granularity based on each chunk’s device affinity, accompanied with work-stealing as the load balancing mechanism, we can achieve a performance improvement of as much as 1.5x over traditional ratio-based allocation, and up to 5x over naive GPU-only allocation on three irregular graph analytics applications.
【 预 览 】
附件列表
Files
Size
Format
View
Intelligent scheduling for simultaneous CPU-GPU applications