The performance of n-tier web-facing applications often suffer from response time long-tail problem. One of the causes of long-tail problem consists of millibottlenecks that appear and disappear within tens to hundreds of milliseconds. We propose a novel approach to detect system level millibottlenecks by fine-grained monitoring of locks. Through the comprehensive analysis of Linux kernel call graph, we found instrumenting around locks can achieve high coverage and minimize the number of instrumenting points. In this dissertation, we present two case studies in diagnosing the root cause of system level millibottlenecks and their impact on N-tier systems. For the first case study, we use concrete experimental evidence that shows our approach can diagnose the root cause of system level millibottleneck, which is caused by conservative stable page. The millibottleneck is somewhat similar to the priority inversion problem in scheduling, the fundamental cause is CFQ scheduler ignores the priories of asynchronous writes. For the second case study, we have found load balancing policies and mechanisms that appeared to work well in stable environments have exhibited several limitations when facing millibottlenecks. Experiments with standard n-tier benchmarks show that during millibottlenecks, some load balancing policy/mechanism combinations make the mistake of sending new requests to the node(s) suffering from millibottlenecks, instead of the idle nodes as load balancers are supposed to do. Several of these mistakes are due to the implicit assumptions made by load balancing policies and mechanisms on the stability of system state. Our study shows that appropriate remedies at policy and mechanism levels can avoid these mistakes during millibottlenecks and remove the VLRT requests, thus improving the average response time by a factor of 12.
【 预 览 】
附件列表
Files
Size
Format
View
Study of system level millibottlenecks and their impact on N-tier system performance