Many parallel algorithms in domains such as graph analytics and simulations execute more efficiently if some parallel tasks are executed before others. To implement such priority-based task scheduling, the data structure of choice is a concurrent priority queue (PQ). Unfortunately, PQ algorithms exhibit an undesirable tradeoff. On one hand, traditional PQs always dequeue the highest-priority task, and thus fail to scale because of contention at the head of the queue. On the other hand, relaxed PQs avoid contention by dequeuing tasks that are often so far from the head that the resulting schedule misses the benefit of priority-based scheduling.This thesis proposes novel architectural support for relaxing PQs without straying far from the priority-based schedule. Our architecture, called Snug, distributes the PQ and maintains a set of Work Registers that point to the highest-priority task in each subqueue. Snug provides an instruction that picks a high-quality task to execute. The instruction periodically switches between visiting all the subqueues to get an accurate global snapshot and visiting nearby subqueues to reduce traffic. Overall, Snug dequeues highquality tasks while simultaneously avoiding hotspots and excessive network traffic. We evaluate Snug on graph analytics and event simulation applications. Snug reduces the average execution time of the applications by 1.6×, 4.9× and 3.4× compared to the state-of-the-art skip list, SprayList, and software-distributed PQs, respectively.
【 预 览 】
附件列表
Files
Size
Format
View
Architectural support for work-efficient relaxed priority queueing