I/O is one of the main performance bottlenecks for many data-intensive scientific applications. Accurate I/O performance benchmarking, which can help us better understand the causes of these bottlenecks and to guide the performance optimization of poor performing applications, is therefore an important problem. We investigate the use of submodular function maximization as a way to select a set of I/O benchmark applications using measures of similarities between applications computed from I/O statistics obtained from the Darshan logs of their jobs. Our optimization problem simultaneously seeks a set of applications that are representative of the applications running on the HPC platform they are chosen from while simultaneously encouraging them to possess diverse I/O behavior between them. We evaluate the quality of the selected applications by training classifiers using features extracted from the jobs of these applications to predict the I/O performance of other jobs that were ran on the platform. Our experiments indicate that the trained classifiers can achieve a fair level of accuracy, thereby lending credence to the feasibility of our optimization approach for selecting I/O benchmark applications.
【 预 览 】
附件列表
Files
Size
Format
View
Machine learning for selecting parallel I/O benchmark applications