Many important workloads today, such as web-hosted services, are limited not by processor core performance but by interactions among the cores, the memory system, I/O devices, and the complex software layers that tie these components together.Architects who optimize system designs for these workloads are challenged to identify performance bottlenecks before the systems are built. This identification is challenging because, as in any concurrent system, overheads in one component may be hidden due to overlapping with other operations.These overlaps span the user/kernel and software/hardware boundaries, making traditional tools inadequate. Common software profiling techniques cannot account for hardware bottlenecks or situations in which software overheads are hidden due to overlapping with hardware operations.This thesis presents a methodology for identifying true end-to-end critical paths in systems composed of multiple layers of hardware and software, particularly in the domain of high-speed networking.The state machines that implicitly or explicitly govern the behavior of all the layers are modeled and their local interactions captured to build an end-to-end dependence graph that can be used to locate bottlenecks. This is done incrementally, with modest effort and only local understanding. Furthermore, it is shown that queue-based interactions are necessary and sufficient to capture information from complexprotocols, multiple connections and multiple processors. The resulting dependence graph is created and analyzed distilling the huge amount of collected data into a set bottleneck locations including where the most un-overlapped time is spent, and locations where the addition of some bufferingcould improve the systems performance without any other optimizations.Additionally, this techniqueprovides accurate quantitative predictions of the benefit of eliminating bottlenecks. The end result of this analysis, minutes after the data is gathered, is: 1) the identity of the component that causes the bottleneck; 2) the extent to which a component must be improved before it is no longer the bottleneck; 3) the next bottleneck that will be exposed in the system; and 4) the performance improvement that will occur before the next bottleneck is reached. The analysis can be repeated for successive bottlenecks and is far faster than the available alternatives.
【 预 览 】
附件列表
Files
Size
Format
View
Full-System Critical-Path Analysis and Performance Prediction.