Recently, Chip Multi-Processor (CMP) or multicore design has become the mainstream architecture choice for major microprocessor makers. In a CMP architecture, some important on-chip platform resources, such as the lowest level on-chip cache and the off-chip bandwidth, are shared by all the processor cores. As will be shown in this dissertation, resource sharing may lead to sub-optimal throughput, cache thrashing, thread starvation and priority inversion for the applications that fail to acquire sufficient resources to make good progress. In addition, resource sharing may also lead to a large performance variation for an individual application. Such performance variation is ill-suited for the future uses of CMPs in which many applications may require a certain level of performance guarantee, which we refer to as performance Quality of Service (QoS). In this dissertation, we address the resource sharing problem from two aspects.Firstly, we propose an analytical and several heuristic models that encapsulate and predict the impact of cache sharing. The models differ by their complexity and prediction accuracy. We validate the models against a cycle-accurate simulation. The most accurate model achieves an average error of 3.9%. Through a case study, we found that the cache sharing impact is largely affected by the temporal reuse behaviors of the co-scheduled applications. Secondly, we investigate a framework for providing performance Quality of Service in a CMP server. We found that the ability of a CMP to partition platform resources alone is not sufficient for fully providing QoS. We also need an appropriate way to specify a QoS target, and an admission control policy that accepts jobs only when their QoS targets can be satisfied. We also found that providing strict QoS often leads to a significant reduction in throughput due to resource fragmentation. We propose novel throughput optimization techniques that include: (1) exploiting various QoS execution modes, and (2) microarchitecture techniques that steal excess resources from a job while still meeting its QoS target. Through simulation, we found that compared to an unoptimized scheme, the throughput can be improved by up to 45%, making the throughput significantly closer to a non-QoS CMP.
【 预 览 】
附件列表
Files
Size
Format
View
Analyzing and Managing Shared Cache in Chip Multi-Processors