Today, modern processors are equipped with a special unit named PMU that enables software developers to gain access to micro-architectural level information such as CPU cycles count and executed instructions count. The PMU provides a set of programmable registers called hardware performance counters that can be programmed to count the specific hardware events. In the Linux operating system, many low-level interfaces are designed to provide access to the hardware counters facilities. One of these interfaces is perf_event, which was merged as a sub-system to the kernel mainline in 2009, and became a widely used interface for hardware counters.Firstly, we investigate the perf_event Linux sub-system in the kernel-level by exploring the kernel source code to identify the potential sources of overhead and counting error. We also study the Perf tool as one of the end-user interfaces that was built on top of the perf_event sub-system to provide an easy-to-use measurement and profiling tool in the Linux operating system. Moreover, we conduct some experiments on a variety of processors to analyze the overhead, determinism, and accuracy of the Perf tool and the underlying perf_event sub-system in counting hardware events. Although our results show 47% error in counting the number of taken branches as well as 5.92% relative overhead on the Intel Pentium 4 processors, we do not observe a significant overhead or defect on the modern x86 and ARM processors.Secondly, we explore a memory management sub-system of Linux kernel called slab allocator, that plays a crucial role in the overall performance of the system. We study three different implementations of the slab allocator that are currently available in the Linux kernel mainline and enumerate the advantages and disadvantages of each implementation. We also investigate the binning effect of the slab allocator on the Linux system calls execution time variation. Moreover, we introduce a new metric called Slab Metric that is assigned to each system call to represent the interaction level with the slab allocator. The results show a correlation coefficient of 0.78 between the dynamic slab metric and the execution time variation of the Linux system calls.
【 预 览 】
附件列表
Files
Size
Format
View
A Study of Linux Perf and Slab Allocation Sub-Systems